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TRANSLATORS'  PREFACE 


Akhmanova's  Preface  explains  why  this  book  was  published  in 
the  Soviet  Union.  All  that  remains  is  for  us  to  explain  why  we 
have  translated  it. 

Computational  linguistics,  or  mathematical  linguistics,  is  de- 
veloping at  least  as  rapidly  in  the  United  States  as  in  the 
U.S.S.R.  Here,  too,  it  suffers  the  growing  pains  that  Mel'chuk 
describes  in  Chapter  IV.  Linguists  tend  to  frown  a  bit  at  the  com- 
puter, to  sneer  somewhat  at  statistics  and  information  theory; 
but  perhaps  they  will  be  interested  in  reading  the  views  of  lin- 
guists who  find  statistics  and  even  electronic  computing  ma- 
chines useful.  One  purpose  of  the  translation,  therefore,  is  to 
make  a  linguistic  introduction  to  the  new  field  accessible  to 
Western  scholars. 

Another  purpose  is  to  disseminate  a  rather  full  and  probably 
accurate  view  of  today's  most  influential  Soviet  work  in  compu- 
tational linguistics.  This  survey  no  doubt  represents  the  best 
work  presently  being  done  in  this  field  in  the  U.S.S.R.;  hence 
it  is  not  a  representative  sample.  On  the  other  hand,  the  future 
of  the  field  is  sure  to  be  influenced  more  by  the  substance, 
amount,  and  quality  of  the  best,  rather  than  the  average,  current 
work.  A  reasonable  prediction,  using  that  premise  and  the  infor- 
mation in  this  book,  is  that  computational  linguistics  in  the 
U.S.S.R.  will  develop  rapidly  and  produce  a  fair  share  of  the 
world's  knowledge  in  the  field. 

This  book  is,  nevertheless,  fairly  weak  in  some  respects.  The 
linguist  whose  curiosity  and  interest  are  stimulated  by  the  chap- 
ters on  statistics  and  information  theory  should  consult  a  statis- 
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tician  or  mathematician  at  once,  before  attempting  any  applica- 
tions himself.  Better  techniques  are  available,  and  more  signifi- 
cant applications  are  possible.  But  Chapters  V  and  VI  are  not  of- 
fered as  handbooks— as  introductions,  they  may  be  more  accept- 
able than  would  be  mathematically  sophisticated  but  linguis- 
tically naive  treatments. 

This  translation  was  prepared  in  support  of  a  continuing 
program  of  research  in  linguistics  and  machine  translation  un- 
dertaken for  the  United  States  Air  Force  by  The  RAND  Cor- 
poration. 

D.  G.  H. 

D.  V.  M. 


PREFACE 


The  present  book  is  an  attempt  to  throw  some  light  on  several 
results  attained  by  science  in  the  area  of  the  applications  of  ex- 
act methods  to  linguistic  research.  The  concept  of  exact  meth- 
ods in  science,  and  of  exact  sciences,  is  inseparably  bound  with 
mathematics— whence  the  expression  "mathematical  linguistics" 
to  designate  this  new  direction  in  linguistic  research.  It  is 
hardly  proper,  however,  to  elevate  this  expression  to  the  level 
of  a  technical  term,  since  such  a  term  could  lead  to  a  distorted 
conception  of  the  nature  of  the  question.  The  essence  of  this  di- 
rection and  its  real  content  consist  not  of  creating  some  special 
kind  of  "linguistics,"  but  rather  of  perfecting,  of  making  accu- 
rate, reliable,  and  modern,  the  methods  of  linguistic  research 
in  the  usual  meaning  of  the  word.  Thus,  it  is  clear  that  the  au- 
thors would  prefer  to  eliminate  the  phrase  "mathematical  lin- 
guistics" from  the  title  of  the  present  book.^  It  is  impossible, 
however,  to  ignore  the  fact  that  this  term  has  already  attained 
a  certain  popularity,  and  therefore  it  seems  inexpedient  to 
avoid  it  altogether. 

The  book  has  four  sections,^  dealing  with  the  following  top- 
ics: 

(a)  Those  questions  of  general  linguistics  that  must  be  clari- 
fied if  the  discussion  of  concrete  methods  of  exact  study  and 
the  description  of  linguistic  phenomena,  in  the  following  chap- 
ters, are  not  to  seem  too  far  removed  from  earlier  linguistics. 

(b)  The  place  and  role  of  machine  translation  in  contem- 


*  [We  have  done  so.— Tr.] 

^  [Chapters  I.  II,  III;  IV;  V;  and  VI.-Tr.] 
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porary  linguistics  in  a  theoretical-linguistic  as  well  as  in  a  prac- 
tical sense. 

(c)  Possible  applications  of  statistical  methods  to  linguistic 
research,  together  with  a  discussion  of  the  basic  principles  of 
statistical  analysis  and  such  basic  statistical  concepts  as  random 
event,  frequency,  and  evaluation  of  accuracy. 

(d)  Possible  applications  of  information  theory  to  language 
study. 

Nowhere  in  the  present  book  have  we  treated  linguistic  af>- 
plications  of  "nonquantitative"  mathematics— in  particular, 
mathematical  logic.  This  large  question  requires  separate  study. 

The  amount  of  factual  detail  that  has  been  developed  in  dif- 
ferent areas  of  language  study  by  exact  methods  is  not  uniform, 
and  this  fact  has  influenced  the  content  of  the  corresponding 
sections  of  the  book.  Thus,  in  Chapter  V,  it  has  proved  possible 
to  discuss  machine  translation  rather  fully,  not  only  presenting 
many  of  its  theoretical  and  practical  problems  but  also  summa- 
rizing basic  approaches  to  the  solution  of  these  problems.  The 
same  point  applies  in  essence  to  Chapter  VI  as  well,  where  we 
have  presented  a  rather  detailed  analysis  of  studies  dealing  with 
the  application  of  the  methods  of  information  theory  in  lan- 
guage study. 

On  the  other  hand,  in  describing  the  role  of  statistical  meth- 
ods in  linguistic  research,  it  has  been  more  convenient  to  re- 
duce the  critical  survey  of  the  literature  to  a  minimum  and  to 
concentrate  on  the  basic  concepts  of  statistics  and  the  basic 
principles  of  statistical  analysis.  The  reason  for  such  an  organi- 
zation of  the  fifth  chapter  is  that  statistical  methods  have  been 
applied  in  linguistics  for  a  rather  long  time,  and  the  literature 
in  this  field  is  so  large  and  specialized  that  any  thorough  criti- 
cal review  of  it  would  have  led  to  an  unjustifiably  great  en- 
largement of  the  book  and  a  distinct  disproportion  in  its  parts. 
Chapters  I  and  II  were  written  by  O.  S.  Akhmanova,  Chapters 
III  and  IV  by  I.  A.  Mel'chuk,  Chapter  V  by  R.  M.  Frumkina,  and 
Chapter  VI  by  E.  V.  Paducheva. 

O.  Akhmanova 
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CHAPTER  I 


Can  Linguistics  Become  an 
Exact  Science! 


"Ever  larger  areas  of  science  are  undergoing  a  salutary  infusion 
of  mathematics;  an  ever  greater  portion  of  the  sciences  is  going 
over  into  the  ranks  of  the  exact.  One  can  foresee  a  swift  de- 
velopment of  this  process  in  the  decade  ahead."  (From  an 
article  by  Acad.  A.  N.  Nesmeyanov,  "A  Look  at  the  Tomorrow 
of  Our  Science,"  Pravda,  January  1,  1960.) 


1.  Linguistic  "Content"  and  Linguistic  "Expression" 

High-speed  electronic  computers  have  given  all  areas  of  knowl- 
edge analytic  means  of  astonishing  capability.  The  "electronic 
brain"  makes  possible  the  solution  of  problems  formerly  not 
open  to  calculation.^ 

Among  the  basically  new  areas  for  the  application  of  elec- 
tronic computers,  machine  translation  and  automatic  informa 
tion  retrieval  occupy  an  important  place.  The  first  models  of 
machines  for  automatic  translation  from  one  language  into  an- 
other, and  of  information  machines,  which  collect  a  huge  store 
of  knowledge  in  their  "memories"  and  put  it  out  on  demand  in 
any  sequence  of  combinations,  have  already  been  created;  these 
machines  have  a  truly  great  future. ^ 


^  Bibliographic  references  for  Chapters  I  and  II  will  be  found  at  the  end  of 
Chapter  II;  similarly,  those  for  Chapters  III  and  IV  will  be  found  at  the  end  of 
Chapter  IV.  The  bibliographies  for  Chapters  V  and  VI  follow  each  of  the  chapters. 

*Here  is  what  Acad.  A.  N.  Nesmeyanov  says  on  this  subject  in  the  article 
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These  two  problems— machine  translation  and  automatic  in- 
formation retrieval— are  alike  in  that  their  solution  demands  a 
basically  new  approach  to  language,  the  development  of  spe- 
cialized methods  of  research  on,  and  description  of,  language. 
This  new  approach  can  be  briefly  defined  in  the  following  fash- 
ion. One  must  learn  how  to  represent  grammatical,  lexical,  lex- 
ico-phraseological,  and  other  regularities  of  language  in  such  a 
form  that  one  can  input  them  directly  to  the  apparatus.  In 
other  words,  it  is  necessary  to  "decode"  the  processes  with 
which  language  communication  is  performed.  Mathematics, 
with  its  inexhaustible  possibilities,  must  provide  the  basis  for 
a  much  deeper  penetration  into  the  "mechanism"  of  language 
and  for  a  thoroughly  strict  and  logical,  fully  "rational"  descrip- 
tion of  the  regularities  it  uncovers. 

As  is  well  known,  language  is  the  most  important  means  of 
human  communication;  but,  at  the  same  time,  it  is  the  immedi- 
ate activity  of  thought,  the  tool  of  development  and  struggle. 
Therefore,  it  is  perfectly  correct  to  approach  language  from  dif- 
ferent although  closely  interrelated  directions.  Concentrating 
on  the  communicative  function  of  language,  one  quite  rightly 
represents  it  primarily  as  a  form  of  communication;  hence,  one 
can  justifiably  attempt  to  consider  language  merely  as  a  vehicle, 
a  structure  for  the  transmission  of  previously  prepared  com- 
munications, and  even  simply  as  an  indication  of  the  existence 
of  internal  and  external  experience. 

Language,  however,  is  not  just  a  vehicle  for  the  transmission 
of  prepared  thoughts;  it  is  the  action  of  thought  itself.  There- 
fore, if  language  is  a  vehicle,  it  is  one  that  not  only  facilitates 
mutual  understanding  but  also  helps  regulate  thought,  organ- 
ize experience,  and  develop  social  self-consciousness. 


cited  above:  "The  dream  of  information  machines  is  not  unfounded,  especially 
if  we  recall  that  at  present  scientific  knowledge— in  the  natural  and  technical 
sciences  alone— is  communicated  throughout  the  world  in  tens  of  thousands  of 
journals.  Chemistry  alone  occupies  more  than  ten  thousand  journals.  Fre- 
quently, it  is  extremely  difficult  to  find  this  or  that  fact  at  its  hiding  place  in 
this  ocean  of  scientific  literature.  I  remember  a  statement  by  a  scientific  worker 
in  an  American  firm  to  the  effect  that  if  a  scientific  task  costs  less  than  several 
hundred  thousand  dollars  it  is  easier  to  redo  it  than  to  search  for  it  in  the 
literature. 

"Scientific-information  machines  undoubtedly  have  a  great  future." 
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Only  through  the  discovery  of  all  facets  of  language  can  one 
achieve  a  full  understanding  of  its  nature  as  a  unique  social 
phenomenon.  Language  should  be  studied  in  connection  with 
research  on  the  causal  bonds  between  linguistic  communica- 
tion and  the  facts  of  the  social  life  of  its  creators  and  bearers, 
with  their  history,  culture,  and  literature.  There  is  not  the  least 
doubt  that  modern  exact  methods  of  research  will  soon  perme- 
ate all  areas  of  our  science,  and  that  linguistics,  in  the  fullest 
and  broadest  sense  of  the  word,  will  assume  an  entirely  modern 
aspect.  But  for  the  present,  when  only  the  first  steps  are  being 
taken  in  this  new  direction,  a  quite  definite  and  deliberate  re- 
striction and  confinement  of  the  area  of  research  is  unavoida- 
ble. Concretely,  as  will  become  clear  from  the  following  out- 
line and  description  of  the  present  condition  of  science  in  the 
area  under  consideration,  research  is  confined  to  two  areas: 

(a)  limited  and  specific  spheres  of  application  of  language, 
namely,  the  language  of  the  exact  sciences,  especially  mathe- 
matics; 

(b)  only  the  communicative  ("intellectual")  function,  the 
description  of  problems  of  intellectual  communication  as  ab- 
stracted from  the  emotional,  aesthetic,  volitional,  and  other  as- 
pects of  language. 

Limitation  of  research  initially  to  the  intellectual-communi- 
cative function  of  language  seems  to  give  one  the  right,  for 
given  specific  purposes,  to  consider  language  as  a  particular  kind 
of  "code  system,"  while  the  actual  "products  of  speech,"  formed 
from  the  elements  of  a  given  "code"  and  bearing  definite  "in- 
formation," may  be  considered  to  be  "communications"  that 
have  a  unique  and  precisely  definable  relation  to  this  code 
system.  But  it  proves  to  be  an  extremely  difficult  matter  to  ap- 
ply this  approach  in  practice.  Ordinary  human  language  is  not 
a  code  such  as  necessarily  presumes  a  one-to-one  correspondence 
between  a  fully  defined  content  and  a  definite  expression. 
Hence,  the  most  diverse  difficulties  and  complications  are  ines- 
capable. 

In  languages  abounding  in  so-called  synthetic  forms,  the  ab- 
sence of  a  simple  one-to-one  correspondence  between  the  des- 
ignator and  that  which  is  designated,  or  between  expression 
and  content,  is  most  graphically  evident.  For  example,  in  the 
Russian  em  [I  eat],  as  in  all  nonproductive  formations  in  gen- 
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eral,  it  is  impossible  synchronically  to  separate  out  those  parts 
o£  the  designator,  or  expression,  that  would  correspond  to  such 
designata,  or  contents,  as  (a)  the  concept  of  the  process  of  eat- 
ing, (b)  indicative  mood,  (c)  first  person,  (d)  singular,  (e)  present 
tense.  Nor  can  simple  one-to-one  correspondences  be  formulated 
for  regular,  or  productive,  formations,  between  the  elements  of 
such  forms  as,  for  example,  zval  [called],  bral  [took],  dal  [gave], 
and  the  various  contents  expressed  by  them— tense,  person,  num- 
ber, voice,  and  mood,  in  conjunction  with  specified  material 
meanings. 

Considered  theoretically,  each  of  the  separate  meanings  con- 
tained in  compounded  complexes  similar  to  those  just  men- 
tioned can  be  abstracted,  or  intellectually  separated  out,  into  a 
sort  of  "minimal  unit  of  meaning,"  and  it  is  quite  natural  to 
propose  that  those  using  language  also  intuitively  perform  an- 
alytic operations  of  a  similar  type.  But  if  such  operations  are  to 
be  transferred  from  the  area  of  intuition  into  the  area  of  logic 
and  rationality  (and  no  machine  can  operate  otherwise),  a  defi- 
nite minimal  unit  of  expression  must  be  made  to  correspond 
regularly  and  sequentially  to  each  minimal  unit  of  meaning.  As 
one  can  see  from  the  examples  cited  above,  however,  in  natural 
human  language  several  minimal  units  of  meaning  often  corre- 
spond to  one  minimal  unit  of  expression.  The  picture  is  even 
more  complex  if  one  considers  that  very  frequently  one  and 
the  same  minimal  unit  of  meaning  turns  out  to  be  embodied 
in  several  completely  different  minimal  units  of  expression— a 
fact  that  is  observable,  for  example,  in  the  co-occurrence  in  lan- 
guage of  various  types  of  noun  declensions,  different  ways  of  ex- 
pressing the  same  categories  of  verbs,  etc. 

Thus  far,  we  have  used  only  two  concepts— meaning  (con- 
tent) and  expression;  i.e.,  for  the  initial  presentation  of  a  prob- 
lem it  has  more  or  less  been  taken  for  granted  that  we  were 
dealing  with  just  two  aspects  of  linguistic  units.  But,  in  fact, 
linguistic  research  and  description  become  possible  only  with 
a  fully  detailed  and  distinct  delineation  of  "expression"  as  the 
external  sound  capsule  of  morphemes,  on  the  one  hand,  and  of 
the  sounds  in  a  language,  considered  as  members  or  elements 
of  its  phonological  system,  on  the  other.  For  example,  t  or  /;,  in 
such  words  as  plot  [raft],  zhaket  [jacket],  portret  [portrait], 
bal  [ball,  dance],  avral  [the  command  for  "all  hands  on  deck"]. 
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fal  [lift— a  part  o£  a  ship's  rigging],  on  the  one  hand,  and 
beret  [takes],  neset  [carries],  vedet  [carries],  zval  [called], 
bral  [took],  upal  [fell],  on  the  other,  will  be  treated  entirely 
differently  by  a  language.  In  the  first  six  words,  these  sounds  are 
not  to  be  separated  out  as  minimal  units  of  expression,  corre- 
sponding to  minimal  units  of  content;  i.e.,  they  do  not  "mor- 
phologize."  But  in  the  second  group,  they  emerge  quite  consist- 
ently and  regularly  as  the  outer  wall  of  the  sound  capsules  de- 
fined as  "units  of  content." 

Between  these  two  basically  different  phenomena,  there  lies 
the  following  empirically  established  difference.  The  sounds 
in  a  language,  as  elements  of  its  phonological  system,  lend 
themselves  to  comparatively  easy  enumeration.  In  any  given 
language,  the  number  of  functionally  distinct  sound  units 
(phonemes)   is  quite  small  (20  to  60) . 

It  is  an  entirely  different  matter  when  certain  sounds  belong 
not  simply  to  the  category  of  "distinguishers"  but  to  the  cate- 
gory of  concrete,  "individually"  fixed  capsules  of  only  these 
and  no  other  morphemes  (e.g.,  the  t  in  beret  [takes]  or  in 
razbityj  [broken],  the  u  in  nesu  [I  carry]  or  in  bratu  [to  a 
brother],  the  a  in  zhena  [wife]  or  in  stola  [of  the  table]  or  in 
doktora  [the  doctors]).  Here,  it  becomes  extremely  difficult  to 
take  an  inventory  of  these  morphemes,  and  their  number  is  in- 
definitely great.  Units  of  expression  of  this  type,  being  "ambiv- 
alent," are  immediately  associated  with  units  of  content  or 
meaning;  and  these  latter  units  are  correlated  with  reality,  re- 
flecting the  multiplicity  of  these  relations  in  which  the  most  di- 
verse phenomena  are  found. 


2.  "Homoplanar"  and  "Heteroplanar"  Approaches  to  the 
Treatment  of  the  Question  of  "Sound"  and  "Meaning" 

From  what  has  been  said,  it  should  be  clear  that  clarifying  the 
relation  between  the  content  (meaning)  of  linguistic  units  and 
their  expression  (especially  their  sound)  is  the  main  problem 
of  contemporary  linguistics.  Therefore,  it  is  natural  that  the 
"homoplanar,"  mechanical  concept  of  descriptive  linguistics 
and  the  "heteroplanar,"  immanent  concept  of  glossematics 
have  long  been  the  objects  of  serious  criticism  from  various 
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points  of  view,  including  that  of  Soviet  linguistics,  particularly 
in  relation  to  the  treatment  of  the  question  of  content  and  ex- 
pression, explicit  and  implicit,  that  these  systems  accept.  In  in- 
sisting that  linguistic  study  and  description  proceed  without 
analysis  of  the  meaning  of  the  registered  units  (since  this  would 
demand  a  consideration  of  processes  not  reducible  to  "opera- 
tionalistic"  research),  the  proponents  of  descriptive  linguistics 
seem  to  oppose  not  only  the  "psychological"  and  "logistic"  tend- 
encies of  the  nineteenth  century  but  also  contemporary  neo- 
saussurianism,  which  postulates  a  double  nature  for  linguistic 
signs,  demands  a  heteroplanar  approach  to  linguistic  pheno- 
mena, and  insists  that  the  object  of  linguistics  cannot  be  lim- 
ited to  a  spoken  and  syntagmatic  level  alone  but  must  neces- 
sarily include  a  paradigmatic  level  also.  In  one  particular  aspect 
of  neosaussurianism— in  glossematics  (as  distinguished  from  de- 
scriptive linguistics)— it  would  seem  that  the  question  of  mean- 
ing as  related  to  sound  occupies  an  especially  large  area,  and 
is,  in  fact,  the  basis  of  the  whole  theoretico-linguistic  construc- 
tion. 

The  foregoing  situation  apparently  follows  from  the  fact 
that  the  basic  method  of  glossematics  is  that  of  "commutation," 
or  the  "commutation  test,"  the  application  of  which  permits 
the  discovery  of  an  "invariant"  of  language,  through  determi- 
nation of  the  correlation  between  the  levels  of  expression  and 
content.  Actually,  this  does  not  hold.  In  glossematics,  the  levels 
of  expression  and  content  are  not  at  all  the  same  thing  as  sound 
and  meaning  in  the  usual  and  natural  sense  of  these  words.  In 
the  same  way,  the  special  glossematic  disciplines— "kenematics" 
and  "plerematics"— are  not  at  all  the  same  as  phonetics  and 
semasiology.  It  is  basic  for  the  two  levels  that  together  compose 
the  semeiological  invariant  that  they  be  functors  (members)  of 
a  given  function,^  which  is  why  the  names  "level  of  content" 
and  "level  of  expression"  have  a  conditional  character.  Besides, 
with  regard  to  "kenemes"  and  "pleremes,"  glossematics  care- 
fully distinguishes  between  the  "form"  of  expression  and  con- 
tent, on  the  one  hand,  and  their  "substance,"  on  the  other.  The 
latter— i.e.,    the   substance   of   expression   and   of  content— are 


'  In    the    terminology   of   Hjelmslev,    the   word   "function"   is   used    to   mean 
relation  [12]. 
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fully  determined  by  their  form,  and  even  exist  only  as  a  result 
of  the  form  ("solely  by  its  gT3.ce'—udelukkende  af  dens  naade) 
and  can  never  in  any  case  be  considered  as  having  independent 
existence. 

In  fact,  glossematics  allows  one  to  perform  an  experiment 
that  includes  comparison  of  different  languages  with  the  pur- 
pose of  extracting  that  which  they  have  in  common,  regardless 
of  which  languages  are  subjected  to  comparison.  That  which  is 
in  common  is  designated  by  the  Danish  term  mening.  But, 
again,  mening  does  not  mean  the  same  ("meaning")  as  in  nor- 
mal usage;  this  can  be  seen  from  the  fact  that  in  English  trans- 
lation the  Danish  mening  is  not  to  be  rendered  by  its  etymologi- 
cal analog— i.e.,  by  the  word  "meaning"— but  by  the  word  "pur- 
port"—"understood  by,"  "bringing  to  mind":  that  which  is 
brought  to  mind  by  the  use  of  this  or  that  unit  of  language— i.e., 
that  which  is  contained  in  the  "intent"  of  the  speaker  transmit- 
ting the  linguistic  communication. 

The  position  that  the  certain  something  contained  in  that  in- 
tent—the purport  of  what  is  said— is  in  itself  amorphous  and 
indefinite,  and  takes  on  clear  definition  only  after  the  form  of 
the  content  of  this  or  that  language  organizes  it— so  much  was 
already  formulated  by  De  Saussure  ([9],  p.  112  et  seq.).  In 
glossematics  this  position  underwent  further  development,  and 
assumed  an  important  place. 

A  portion  of  the  spectrum  is  usually  cited  as  explanation  of 
what  has  been  said  about  substance  and  form.  For  example: 


Language 

Spectrum 

English 

green 

blue 

gray 

brown 

Welsh 

glas 

llwyd 

Russian 

Zelenyj 

sinij 

goluboj 

seryj 

korichnevyj 

Mening,  in  this  instance,  is  the  section  of  the  spectrum  itself. 
It  is  understood,  or  contained,  in  the  intent;  it  is  the  "purport" 
of  the  utterance.  But  an  utterance  can  be  realized  only  when 
language  "throws  its  net"  over  amorphous  "purport,"  and  gives 
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form  to  the  amorphous  substance  of  content— i.e.,  in  this  in- 
stance, arbitrarily  splits  it  into  two,  four,  five,  or  some  other 
number  of  parts.'* 

The  examples  cited  make  it  possible  to  elucidate  the  gen- 
eral concept  of  the  relation  between  form  and  substance  in 
content,  on  the  one  hand,  and  form  and  substance  in  expres- 
sion, on  the  other,  that  we  find  in  glossematics.  But  they  do  not 
help  at  all  in  explaining  the  relation  between  the  substance  of 
content  and  the  substance  of  expression,  i.e.,  between  those 
parts  of  the  entire  structure  that  approach  most  closely  the  us- 
ual conception  of  meaning  and  sound.  Moreover,  it  is  easy  to 
prove  the  absence  of  parallelism,  the  basic  impossibility  of  a 
direct  correlation  of  the  two  substances  by  the  method  of  com- 
mutation, which,  by  the  way,  was  most  convincingly  done  by 
Siertsema^  ([26],  p.  149).  As  Siertsema  has  shown,  a  picture  of 
the  attempt  to  define  the  correlations  would  have  the  form 
shown  in  Figure  1. 

It  is  entirely  possible  that  all  the  extending  bonds  between 
linguistic  research  and  the  concepts  and  categories  of  mathe- 
matical (theoretical)  logic  will  lead  to  a  complete  reorienta- 
tion of  the  methods  of  glossematics  and  of  its  tests.  Still,  from 


*  In  principle,  the  same  should  also  be  applied,  as  the  result  of  a  full  sym- 
metry of  levels,  to  the  relationship  of  form  and  substance  in  an  expression,  as 
in  this  example: 


Language 

A  cross  section  of  the  roof  of  the  mouth, 
from  lips  to  pharynx 

English 

P 

t 

k 

Lettish 

P 

t 

k} 

k^ 

Eskimo 

P 

t 

k^ 

k^ 

That  is,  if  English  k  includes  the  entire  "palato-uvulo-velar  zone,"  Lettish 
separates  the  velar  and  velopalatal,  while  Eskimo  separates  the  uvular  and 
velar. 

°  It  is  not  without  interest  to  note  that  in  Siertsema's  opinion  this  basic  non- 
correlatability  is  the  irrefutable  proof  of  the  basic  inacceptability  of  the  "method 
of  commutation,"  which  is  the  cornerstone  of  the  whole  glossematic  structure. 
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gray 

blue 

green 

[grei] 

[blu:] 

[grim] 

w  w '"' 


Figure  l.     Result  of  an  Attempt  To  Define  Correlations 
of  Content  and  Expression. 

the  viewpoint  of  the  special  and  concrete  problems  of  linguis- 
tics, the  following  position  remains,  in  fact,  effective:  Although 
descriptive  linguistics  departs  from  the  homoplanar  principle, 
and  glossematics  from  the  heteroplanar,  in  the  treatment  of  the 
phenomena  presented  here,  the  factual  understanding  of  the 
correlation  of  sound  and  meaning  is  (if  one  puts  aside  termi- 
nological differences  and  gets  to  the  heart  of  the  matter)  the 
same  in  both  these  directions.  Therefore,  Hjelmslev's  scheme- 


Linguistics 


The  substance 
of  content 


The  form 
of  content 


The  form 
of  expression 


The  substance 
of  expression 


emerging  from  his  definition  of  linguistics  as  a  science  in  which 
the  study  of  expression  is  not  "phonetics,"  and  in  which  the 
study  of  content  is  not  "semantics"— fully  corresponds  to  de- 
scriptive linguistics'  postulated  exclusion  of  phonetics  as  "pre- 
linguistic,"  and  to  its  exclusion  of  meaning  from  research.  Much 
has  been  said  and  written  about  this.  For  both  schools,  the  ba- 
sic subject  of  linguistic  science  is  the  study  of  "structures";  re- 
duction of  these  structures  to  two  levels  also  appears,  in  fact,  to 
be  general  for  both— in  essence,  Hjelmslev's  "unit  of  expres- 
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sion"  and  "unit  of  content"  correspond  exactly  with  the  "pho- 
neme" and  "morpheme"  of  descriptive  linguistics.*' 


3.  A  Compromise  between  the  Homoplanar  and 
Heteroplanar  Approaches 

As  noted  frequently  in  the  literature,  meaning,  though  in  fact 
the  basis  of  all  descriptive  morphology,  has  not  received  either 
official  recognition  or  any  definition  in  this  linguistic  school. 
Therefore,  it  is  quite  intriguing  to  come  upon  a  treatment  of 
this  question  of  meaning  in  Gleason's  book.  An  Introduction  to 
Descriptive  Linguistics  [5]  (especially  since  this  book  has  been 
translated  into  Russian  and  has  thus  become  widely  known 
among  Soviet  linguists). 

Gleason's  solution  to  the  question  of  sound  and  meaning  is 
very  interesting.  Being  above  all  a  popularizer,  Gleason  imme- 

'As  is  known,  the  publication  in  America  of  Hjelmslev's  work,  in  English, 
has  greatly  facilitated  closer  relations  between  the  two  schools  [12].  E.  Haugen's 
review  on  this  subject  is  very  interesting  [IJAL,  Vol.  20,  No.  3,  July,  1954,  pp. 
247-251). 

It  seems  especially  easy  to  draw  a  parallel  between  Hjelmslev's  "prolegomena" 
and  Z.  S.  Harris'  generalized  monograph  on  descriptive  linguistics  [29].  Both 
Hjelmslev  and  Harris  completely  repudiate  the  concepts  of  morphology  and 
syntax  (Hjelmslev,  p.  76  of  the  Danish  edition;  Harris,  p.  262);  they  both  de- 
velop methods  for  segmentation  into  "immediate  constituents"  on  the  basis  of 
substitution  (Hjelmslev,  p.  38;  Harris,  p.  369);  both  tend  toward  a  description 
distinguished  by  its  maximal  simplicity,  exhaustive  character,  and  consistency; 
both  consider  it  their  ultimate  aim  to  structure  texts  of  a  given  language 
(Hjelmslev,  p.  16;  Harris,  p.  372),  demanding  that  the  researcher  leave  aside 
purely  formal  criteria,  and  placing  relations  at  the  center  of  attention,  not  the 
related  objects— entities— as  such  (Hjelmslev,  p.  22;  Harris,  p.  365);  both  con- 
struct a  description  at  sequential  levels,  which  together  make  up  what  Hjelmslev 
calls  a  "hierarchy."  (A  hierarchy  is  a  class  of  classes,  consisting  of  a  series  of 
segmentations  from  the  largest  to  the  smallest  units  that  can  be  obtained 
through  research.  Each  such  class  is  similar  to  what  in  American  linguistics  is 
called  a  "level.") 

The  basic  difference,  which  according  to  Hjelmslev  is  that  research  should  begin 
with  entire  texts  (in  order  to  segment  them  subsequently  on  the  basis  of  com- 
mutation), cannot  be  considered  essential;  in  fact,  it  is  entirely  unimportant 
whether  one  begins  with  the  largest  or  the  smallest  units,  especially  since  in 
practice  linguists  usually  begin  "in  the  middle"— m  medias  rei- using  as  "texts" 
such  utterances  as  are  easily  reproduced  within  substitution  frames. 
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diately  ran  into  the  complete  impossibility  of  a  comprehensive 
and  consistent  presentation  of  the  question  of  meaning,  and  its 
relation  to  sound,  in  the  terminology  of  descriptive  linguistics. 
But  the  terminology  of  glossematics  seemed  too  abstract  to  be 
presented  without  change  as  a  basis  for  the  concrete  methods 
of  description  that  compose  the  main  part  of  the  book.  For  this 
reason,  he  took  the  terminology  of  glossematics,  but  gave  it,  in 
addition,  a  much  simpler  and  more  commonly  acceptable  mean- 
ing. The  result:  "Linguistics  is  the  science  which  attempts  to 
understand  language  from  the  point  of  view  of  its  internal 
structure"  (see  p.  2).  To  penetrate  the  structure  of  language, 
one  must  bear  in  mind  the  fact  that  "language  operates  with 
two  kinds  of  material.  One  of  these  is  sound.  .  .  .  The  other  is 
ideas,  social  situations,  meanings— English  lacks  any  really  accept- 
able term  to  cover  the  whole  range— the  facts  or  fantasies  about 
man's  existence,  the  things  man  reacts  to  and  tries  to  convey  to 
his  fellows.  These  two,  insofar  as  they  concern  linguists,  may 
conveniently  be  labeled  expression  and  content"  (see  p.  2). 

It  turns  out  that  language,  from  the  nature  of  the  "two  lev- 
els" that  create  it,  is  represented  by  two  constituents— content 
and  expression.  Consequently,  it  is  pointless  to  insist  on  the  ne- 
cessity of  developing  methods  that  would  allow  the  study  of 
language  without  resort  to  meaning,  and  one  of  the  basic  the- 
oretical postulates  of  descriptive  linguistics  is  completely  re- 
futed. For  Gleason,  too,  the  basic  subject  of  linguistics  remains 
"the  internal  structure  of  language,"  but  it  must  not  be  studied 
by  "purely  formal  methods,  with  no  regard  to  meaning." 

As  has  often  seemed  to  be  the  case,  this  basic  postulate  of  de- 
scriptive linguistics  was  justifiable  by  the  basic  impossibility  for 
the  linguist  scientifically  to  define  meaning.  Definitions  of 
meanings  should  occupy  all  the  other  sciences,  which  study  not 
words  and  their  combination  in  speech,  but  objects  (e.g.,  only 
an  astronomer  can  define  the  meaning  of  the  word  "moon"). 
The  possibilities  for  a  scientific  description  of  meaning,  and, 
in  general,  a  solution  to  this  question  from  "mechanistic"  ("op- 
erational") positions,  are  not  now  proposed;  nevertheless,  al- 
though the  nature  of  content  in  language  remains  undeter- 
mined as  before  (something  for  which  there  is  no  suitable 
word  but  which,   in  general,  includes  various  things  such  as 
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"ideas,  social  situations,  meanings  .  .  .";  see  above),  now,  it  is 
not  so  far  removed,  and  it  constitutes  a  part  of  language  along 
with  expression. 

From  the  above  exposition,  it  is  clear  that  although  Gleason 
uses  the  terminology  of  glossematics,  he  applies  it  arbitrarily: 
For  him,  content  and  expression  are,  respectively,  what  Hjelms- 
lev  calls  the  substance  of  content  and  the  substance  of  expres- 
sion. The  structure  of  expression  may  be  obtained  by  research 
on  the  series  of  sounds  regularly  recurring  and  subject  to  pre- 
diction. As  for  the  structure  of  content,  although  the  idea  of  a 
strictly  parallel  structure  on  both  levels  is  preserved  here  as 
well,  a  much  less  clear  explanation  is  forthcoming.  "The  speaker 
includes  what  he  says  within  the  limits  of  an  organizing  struc- 
ture. This  structure  forces  him  to  choose  several  features  for 
description,  and  determines  the  means  by  which  he  interrelates 
them.  It  also  analyzes  the  situation  in  a  particular  fashion. 
These  selected  features,  like  the  above-mentioned  sounds,  also 
form  patterns  which  recur,  partially  predictable.  These  recur- 
ring patterns  are  the  structure  of  content." 

It  seemed  useful  to  pursue  somewhat  further  the  correspond- 
ing sections  of  Gleason's  textbook,  since  they  very  clearly  re- 
flect the  unsatisfactory  condition  that  obtains  in  the  study  of 
one  of  the  basic  (if  not  the  basic)  questions  of  linguistics,  and 
in  the  two  most  important  directions  of  linguistic  structuralism. 
Gleason's  textbook  is  quite  typical  in  this  respect,  since  the  au- 
thor did  not  take  into  account  a  criticism  of  the  "descriptive" 
and  "glossematic"  concepts— a  criticism  at  once  very  serious  and 
convincing,  and  having  a  more  and  more  definite  and  categori- 
cal nature. 


4.  "Primary"  and  ''Secondary"  Segmentation  in  Language 

Among  those  works  which,  in  criticizing  the  extremeness  of 
descriptive  linguistics  and  glossematics,  compare  constructive 
doctrines  to  the  doctrines  of  the  former,  an  article  by  A.  Mar- 
tinet, "Linguistic  Arbitrariness  and  Double  Articulation"  [18], 
is  especially  interesting  for  a  study  of  the  question  of  sound 
and  meaning  (les  sons  et  le  sens);  this  article  gives  the  results 
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of  the  previous  criticism  of  the  concept  of  isomorphism  [2].  By 
insisting  on  the  absohite  parallelism  of  the  two  levels— content 
and  expression— glossematics  (and  after  it,  descriptive  linguis- 
tics as  well)  gives  a  distorted  picture  of  the  actual  state  of  af- 
fairs. Even  leaving  aside  the  excesses  of  the  transcendental  doc- 
trine that  regards  content  and  expression  as  only  the  "functors 
of  the  sign  function,"  and  turning  from  Gleason's  generally 
available  interpretation,  the  main  points  will  still  be  over- 
looked: (1)  the  hierarchy  of  meaning  and  sound,  the  subordi- 
nation of  the  second  to  the  first,  the  leading  role  of  the  first  in 
relation  to  the  second,^  and  (2)  the  necessity  arising  therefrom 
for  a  much  finer  analysis  of  the  subject— an  analysis  that  shows 
most  convincingly  that  the  question  is  in  no  way  reducible  to 
two  sides  (faces)  of  a  linguistic  "sign."^  There  cannot  be  the 
least  doubt  that  the  basic  opposition  is  an  opposition  of  pho- 
nemes, on  the  one  hand,  and  the  designator-designated,  on  the 
other.  In  schematic  form,  this  relationship  (if  we  turn  to  the 
example  shown  in  Figure  1)  will  appear  as  given  in  the  table 
below. 


' ".  .  .  la  subordination  des  sons  au  sens  qui  semble  incompatible  avec  le 
parallelisme  integral  que  postule  la  theorie"  (he  has  in  mind  the  theory  of 
isomorphism);  A.  Martinet  [7],  p.  105. 

Usually  (normally),  meaning  is  so  leading  and  determining  in  the  communi- 
cative functioning  of  linguistic  units  that  those  using  a  language  do  not  notice 
the  many  properties  of  the  level  of  expression— the  sound  material  of  language. 
The  latter  emerges  at  the  former  level  and  takes  on  independent  meaning  only 
in  special  cases,  such  as  onomatopoeia  (Lautmalerei),  proper  names,  etc.  O.  S. 
Akhmanova  has  devoted  an  article  in  the  jubilee  collection  for  Acad.  Petrovici 
to  the  specific  role  of  "sound"  in  proper  names. 

*As  R.  O.  Jakobson  [14]  has  most  convincingly  shown,  the  inherent  tendency 
of  some  representatives  of  neosaussurianism  to  join  up  whatever  you  please 
within  the  framework  of  the  "designator-designated"  dichotomy  is  the  fruit  of 
a  misunderstanding.  This  is  not  De  Saussure's  innovation  but  simply  a  repeti- 
tion of  a  theme  2,000  years  old.  Definition  of  the  sign  (signe)  as  a  juncture  of 
designator  and  designated  coincides  literally  with  the  semeion  of  the  Stoics, 
composed  of  a  semeinon  and  a  semeinomenon,  and  with  the  adaptation  of  the 
ancient  Greek  model  of  St.  Augustine  in  the  form  signum  z=  signans  -|-  signa- 
tum.  One  should  add  (and  this  has  become  especially  obvious  since  the  pub- 
lication of  R.  Godel's  work  [11])  that,  as  frequently  happens  to  posthumous 
publications  that  did  not  receive  the  author's  approval,  something  emerged  in 
the  form  of  a  finished  (and  original)  concept  which,  to  the  author,  was  only 
a  stage  in  the  yet  incomplete  development  of  a  direction  of  research. 
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1st  Segmentation^ 

2nd  Segmentation^ 

(premiere  articulation) 

(deuxieme  articulation) 

Bilateral  Units 

Unilateral  Units 

(morphemes  and  words) 

(phonemes) 

[blu:]  as  a  whole,  as 

[blue],  corresponding 

differentiating  the  sound  cap- 

a global,  unsegment- 

to  a  certain  part  of  the 

sule  of  a  given  morpheme  from 

able  unit,  unitarily 

spectrum,  etc. 

all  other  morphemes  [b]  [1]  [u:]. 

and  directly  related 

The   basic   property   of  these 

to  meaning,  having 

units  is  that  they  are  not  im- 

a definite  semantics 

mediately     correlatable    with 
meaning. 

^  In  Martinet's  terminology,  the  first  segmentation  yields  a  minimal  bilateral 
unit  (the  morpheme  of  the  majority  of  structuralists),  and  the  second  yields 
sequential  and  minimal  unilateral  units,  of  which  the  basic  function  is  a  differ- 
entiation (phonemes). 


In  Other  words  (a  reality  clearly  not  reflected  in  the  abstrac- 
tion of  the  glossematic  scheme)  there  are,  on  the  one  hand,  dis- 
tinguished ("opposed")  phonemes  and,  on  the  other  hand,  des- 
ignators and  designata,  "objects"  (in  the  broad  meaning)  and 
words— bilateral  units,  basically  different  from  phonemes,  which 
are  the  unilateral  units  of  the  "differential  level."® 

Of  course,  when  we  say  words  (and  not  morphemes),  we  de- 
liberately break  with  the  established  tradition  of  contemporary 
linguistics,  since  morphemes,  as  ultimate  units  of  the  first  seg- 
mentation (or  "semantic  level"),  must  theoretically  be  the  ba- 
sis of  the  whole  consideration.  However,  if  we  simplify  the  dis- 
cussion on  the  theoretical  level,  the  preference  of  morpheme  to 
word  artificially  simplifies  and  schematicizes  the  facts.  But  the 
basic  problem  of  contemporary  linguistics  is  precisely  to  free 


°  The  question  of  the  relation  of  primary  and  secondary  segmentation  re- 
ceived very  interesting  treatment  in  A.  Martinet's  well-known  studies  in  his- 
torical phonology  [7].  Although  the  author  of  this  article  disagrees  with  A. 
Martinet's  basic  tenets  (including  the  role  of  the  system  in  change  in  the  sound 
relations,  the  concept  of  the  principle  of  "economy,"  the  role  of  sound  prop- 
erties which  are  not  "differential,"  etc.— see  L.  Zgusta's  review  on  this  subject 
in  Archiv  Orientalni,  Vol.  27,  1959,  pp.  338-341),  the  system  of  contemporary 
phonological  concepts  is  presented  in  this  book  so  clearly,  and  the  contemporary 
problematics  of  "secondary  segmentation"  is  developed  so  fully,  that  the  value 
of  this  book  to  the  science  of  "sound  languages"  can  hardly  be  overestimated. 
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itself  from  schemes  and  dogmas  and  to  turn  to  a  study  of  the 
contemporary  problems  of  human  communication  in  all  their 
real  complexity  and  uniqueness.  The  ultimate  units  of  real 
communication  are  the  minimal  components  of  sentences- 
words.  And  words  have  complex  meanings,  composed  of  differ- 
ent elements  and  segmentable  at  various  levels  and  in  various 
aspects.  For  example: 

(1)  "Categorical"  Meaning.  A  word's  belonging  to  one 
"word  class"  or  another  (for  us,  the  parts  of  speech)  provides 
it  with  a  categorical  semantic  capsule  (a  general  or  "categori- 
cal" meaning,  belonging  to  an  entire  given  class  or  series  of 
words),  with  which  is  encapsulated  a  particular,  individual 
part  of  the  semantics  of  a  word— its  "semantic  nucleus."  There- 
fore, for  example,  the  Russian  gored,  compared  with  the  French 
ville  and  the  English  "town,"  will  yield  the  picture  shown  in 
Fis:ure  2. 


gOTOd 


ville 


"town" 


Figure  2.     Categorical  (Semantic)  Capsules. 


The  "categorical  (semantic)  capsule"  of  a  word  is  tied  in, 
on  the  one  hand,  with  general  rules  for  its  combination  with 
other  words  (its  grammatic  requirements)  and,  on  the  other 
hand,  with  its  individual  lexical  nucleus  and  its  own  special 
meaning  content.  Therefore,  one  or  another  categorical  seman- 
tic capsule  of  a  word  may  more  or  less  closely  adjoin  its  indi- 
vidual nucleus,  or  it  may,  on  the  contrary,  be  easily  separable 
from  the  latter:  In  some  cases,  what  is  contained  as  one  aspect 
in  the  nucleus  is  simply  emphasized— e.g.,  the  aspect  of  sub- 
stantivity  in  the  semantics  of  the  word  gorod;  in  other  instances, 
the  categorical  semantic  capsule  can  add  to  the  semantic  nu- 
cleus a  certain  aspect  not  inherent  in  it  as  such— e.g.,  the  aspect 
of  "substantivity"  in  the  semantics  of  a  deverbative  noun. 

(2)  Meaning  Derived  from  Component  Morphemes.  Corre- 
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spending  with  the  morphological  structure  o£  the  base  o£ 
a  word,  in  its  general  semantics,  we  can  more  or  less  distinctly 
separate  out  the  meanings  of  individual  word-forming  mor- 
phemes entering  into  its  composition,  i.e.,  such  morphemes  as 
the  roots  and  lexical  (word-forming)  affixes.  The  meaning  of 
an  individual  morpheme  within  the  composition  of  a  word  may 
be  only  very  distantly  and  obliquely  connected  with  its  gen- 
eral lexical  semantics,  and  not  only  not  help  in  its  discovery 
but  even  hinder  it,  such  as  in  the  Russian  otrada  [joy,  delight] 
(why  ot  [from,  of]?  \rada  =  joy]);  in  gromootvod  [lightning 
rod— lit.,  thunder  conductor]  (which  does  not  carry  off  thun- 
der); in  the  German  Walfisch  (whale-fish,  although  a  whale  is 
not  a  fish),  etc.^° 

The  selection  and  classification  of  "elementary  meanings," 
with  the  purpose  of  subsequently  rendering  them  by  nonlin- 
guistic  symbols  on  some  large  scale,  presupposes  the  solution 
(if  only  in  working  form)  of  a  number  of  problems  of  general 
semantics  that  we  shall  discuss  below.  But  this  solution  remains 
as  yet  undiscovered.  Furthermore,  up  to  the  present,  even  the 
method  of  approach  to  these  problems  is  unclear.  On  the  one 
hand,  we  have  every  reason  to  believe  that  "pure  meaning"  is, 
in  general,  nonexistent,  and  that  there  exists  only  the  meaning 
bound  up  with  language;  every  language  already  bears  its  own 
particular  variant  of  cumulative  meanings,  its  own  complex 
image  of  reality.  On  the  other  hand,  a  directly  contrary  ap- 
proach is  still  postulated  with  great  insistence.  It  is  insisted  that 
the  direct  analysis  of  languages,  predominant  at  present,  must 
unavoidably  end  in  failure,  just  as  a  physicist  would  prove  help- 
less if  he  tried  from  the  beginning  to  apply  his  own  laws  to  na- 
tural objects— trees,  rocks,  etc.  The  physicist  starts  by  relating 
his  laws  to  the  simplest  constructions— to  an  ideal  lever,  mathe- 
matical pendulums,  point  masses,  etc.  Armed  only  with  laws 
relating  to  these  constructions,  he  later  finds  himself  able  to 
analyze  the  complex  behavior  of  real  bodies,  and  thus  to  regu- 
late them. 


"  For  a  more  detailed  treatment  of  this  question,  see  [3],  p.  43  et  seq.,  and 
[1].  A  special  work  by  A.  I.  Smirnitsky  [8]  is  devoted  to  a  detailed  consideration 
of  the  question  of  sound  and  meaning  from  the  point  of  view  of  the  semantic 
structure  of  a  word. 


"Primary"  and  "Secondary"  Segmentation  in  Language     17 

As  we  shall  show  later,  the  basic  problems  of  general  seman- 
tics were  in  the  past  subjected  to  calculation  and  proof  by  prac- 
tice with  concrete  problems  in  the  composition  of  subsidiary  in- 
ternational languages.  Now,  these  problems  take  on  new  mean- 
ing and  special  relevance  in  connection  with  the  problem  of  so- 
called  machine  language,  about  which  a  few  words  may  be  said 
here.^^ 

The  problem  of  an  artificial  language  (machine  language) 
as  a  general  language  for  computational  linguistics  emerges  in 
two  forms:  in  the  area  of  machine  translation,  in  the  form  of 
an  interlingua  [intermediary  language— Tr.];^^  and  in  the  area 
of  machine  information  retrieval  [search  of  literature— Tr.],  in 
the  form  of  an  information  language.  The  distinction  between 
an  interlingua  and  an  information  language  lies  in  the  fact  that 
the  first  has  as  its  goal  the  translation  into  a  single  system  of 
the  entire  content  of  those  languages  among  which  it  "medi- 
ates," to  become  a  generalized  net  of  relations,  having  no  right 
to  discard  or  drop  any  single  item  of  the  content  that  exists 
within  the  languages  bound  by  it.  Therefore,  one  can  accept 
a  definition  of  it  in  terms  of  a  "receding"  of  natural  languages, 
and  regularly  study  different  languages  as  only  an  alternation 
of  the  code  (one  language  is  a  codified  form  of  another).  As 
distinct  from  an  interlingua,  the  language  of  the  information 
machine  is  free  to  simplify  or  to  complicate  its  structure  as  de- 
sired, arbitrarily  to  select  and  combine  "elementary  meanings." 
Therefore,  the  question  of  an  information  language  touches  on 
the  basic  problems  of  language  structure  in  general,  questions 
of  its  modeling  and  rationalization— and  has  great  significance 
for  general  linguistics. 

In  working  out  machine  languages,  as  in  classic  interlinguis- 


"  One  should  note  that  there  already  exist  several  projects  on  artificial  lan- 
guages: See,  for  example,  the  Semantic  Code  Dictionary  of  J.  Perry,  A.  Kent, 
and  J.  L.  Melton  [22];  further,  R.  Ranganathan,  "Natural,  Classificatory,  and 
Machine  Languages,"  in  Information  Retrieval  and  Machine  Translation  (A. 
Kent,  ed.,  Interscience  Publishers,  New  York,  1961,  Part  2,  pp.  1029-1037); 
Bolting's  "Steniglott"  in  the  Proceedings  of  the  Cleveland  Conference  for  Stand- 
ards on  a  Common  Language  for  Machine  Searching  (September  6-12,  1959); 
the  "Descriptors"  of  Mooers  (C.  N.  Mooers,  "Zatocoding  Applied  to  Mechanical 
Organization  of  Knowledge,"  American  Documentation,  Vol.  2,  1951,  pp.  20-32); 
et  al. 

"  See  Chapter  IV,  Sec.  5. 
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tics,  "autonomists"  and  "naturalists"  (see  [4],  p.  47)  are 
clearly  distinguishable.  Bolting's  Steniglott  gives  4,000  words 
for  Greek,  Latin,  English,  German,  and  Slavic  languages.  The 
most  concrete  application  of  the  naturalist  principle  to  the  cre- 
ation of  an  information  language  is  found  in  J.  Perry  and  A. 
Kent;  the  semantic  multiplier  for  a  class  of  machines  will  be 
MACH;  for  a  class  of  crystals,  CRIS;  for  a  class  of  documents, 
DOCM;  for  a  class  of  diffusions,  DIFF;  for  a  class  of  gases, 
GASS;  for  a  class  of  minerals,  MINR;  for  a  class  of  metals, 
METL;  etc. 


CHAPTER  II 

The  Place  of  Semantics  in 
Nlodern  Linguistics 

1.  Linguistic  Meaning  and  Translation 

As  was  shown  in  the  preceding  chapter,  a  basically  new  ap- 
proach to  the  question  of  meaning  is  urgently  necessary  for  the 
development  of  modern  methods  in  linguistic  research.  Al- 
though this  new  approach  has  only  recently  appeared  before 
researchers  in  its  fullest  form,  many  different  efforts  to  develop 
it  have  occupied  "interlinguists"  of  various  tendencies  for  a  long 
time.  Many  of  these  efforts  not  only  have  not  yet  lost  their  sig- 
nificance but,  on  the  contrary,  have  now  taken  on  special  mean- 
ing in  connection  with  the  basically  new  problems  mentioned 
above— various  forms  of  modeling  and  rationalization  of  lin- 
guistic communication,  which,  by  the  way,  are  closely  bound 
with  new  content  inserted  in  the  concept  of  translation.  In  con- 
temporary linguistics,  translation  is  not  unjustifiably  defined 
as  one  of  the  basic  problems  of  human  communication:  One  or 
another  form  of  "translation"  necessarily  presupposes  not  only 
the  study  of  a  foreign  language  and  the  mastery  of  one's  own, 
but  also  mastery  of  every  expression  and  communication  of 
thought  and  of  experience.  It  is  quite  obvious  that  the  central 
problem  in  translation  (and,  consequently,  in  contemporary 
linguistics  in  general)  is  the  question  of  meaning.^  Therefore, 


*  On  the  problem  of  meaning  as  a  basic  problem  in  MT,  see  Chapter  IV. 
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this  question  now  appears  at  the  center  of  the  linguist's  atten- 
tion. 

This  extremely  broad  concept  of  the  essence  of  translation 
lies  at  the  basis  of  the  concept  of  meaning  as  a  purely  semeiotic 
fact;  from  this  point  of  view,  both  for  linguists  and  for  all  who 
use  languages  naturally,  the  meaning  of  a  linguistic  "sign"— con- 
cretely speaking,  "word"— is  nothing  else  but  the  translation  of 
it  by  means  of  another  sign,  usually  one  more  fully  developed. 
Translation  is  of  three  kinds:  (1)  intralingual,  i.e.,  the  expla- 
nation of  verbal  signs  by  means  of  other  signs  in  the  same  lan- 
guage; (2)  interlingual,  i.e.,  properly  speaking,  the  translation 
or  explanation  of  verbal  signs  in  one  language  by  means  of  ver- 
bal signs  in  another  language;  and  (3)  intersemeiotic  transla- 
tion, or  transmutation,  which  means  the  explanation  of  verbal 
signs  by  means  of  nonverbal  sign  systems  [15].  The  most  wide- 
spread and  up  to  now  the  most  important  form  of  translation 
in  practice,  i.e.,  interlingual  translation,  usually  presupposes  not 
the  exchange  of  individual  signs  in  one  language  for  individual 
signs  in  another,  as  "code  units,"  but  rather  the  replacement  of 
whole  statements  in  one  language  by  equivalent  statements  in 
the  other,  i.e.,  equivalence  in  difference.  This  latter  is  the  basic 
problem  of  language  and  the  main  subject  of  linguistics.  With- 
out a  solution  to  this  problem,  neither  a  description  of  languages 
nor  the  creation  of  dictionaries  and  bilingual  grammars  is  pos- 
sible. 

All  that  is  available  to  knowledge  can  be  expressed  in  any  ex- 
isting language;  hence,  the  absence  of  grammatical  correspond- 
ence can  be  supplemented  by  lexical  means.  It  is  more  difficult 
to  replace  missing  special  expressions,  terms,  and  words.  Since 
languages  are  distinguishable  for  the  most  part  not  by  what 
they  can  express  but  rather  by  what  they  must  express,  repeated 
translation  of  a  message  between  two  languages  tends  to  im- 
poverish it.  However,  the  fuller  the  context,  the  less  will  be  the 
loss  of  information. 

The  ability  to  speak  in  a  certain  language  includes  an  abil- 
ity to  speak  about  that  language;  a  conscious  (or  intellectual) 
level  of  language  permits  and  even  requires  interpretation  by 
an  exchange  of  code,  or  by  translation.  Therefore  on  the  con- 
scious or  intellectual  level  there  can  be  no  discussion  of  un- 
translatability.  But  in  mythology,  magic  formulas,  poetry,  etc.. 
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the  picture  changes  basically.  Here  is  involved  not  only  transla- 
tion in  its  essential  meaning  but  also  creative  transposition, 
which  can  take  three  forms:  (1)  intralingual  transposition  from 
one  poetic  form  to  another;  (2)  interlingual  transposition  from 
one  language  to  another;  and  (3)  intersemeiotic  transposition 
from  one  symbolic  system  to  another,  i.e.,  from  the  sphere  of  the 
artistic  word  to  music,  dance,  cinematography,  or  painting. 

This  new  and  very  interesting  approach  to  translation  as  a 
new  and  perhaps  key  problem  of  linguistics  in  general,  i.e.,  a 
clear  and  productive  classification  of  the  various  aspects  of  trans- 
lation (especially  productive  in  promoting  the  development  of 
automatic  translation,  where  the  problem  of  intersemeiotic 
translation  takes  on  a  special  meaning),  deserves  the  most  seri- 
ous attention.  However,  its  general  methodological  premises  de- 
mand serious  discussion.  From  the  standpoint  of  the  most  gen- 
eral foundations  for  corresponding  structures,  the  whole  mul- 
tiplicity of  studies  can  be  reduced  to  two  basic  approaches:  (1) 
studies  proceeding  from  an  understanding  of  meaning  as  a  se- 
meiotic  fact  in  the  sense  developed  above;  (2)  studies  proceed- 
ing from  the  fact  that  "understanding"  a  word  or  group  of 
words,  and  consequently  "accepting"  them  in  general,  and  then 
"transmitting"  the  linguistic  communication  are  possible  only 
when  there  exists  some  conception  of  an  "intralinguistic"  ob- 
ject, designated  by  a  given  linguistic  unit,  or  a  conception  of 
the  phenomenon  of  objective  activity.  Therefore,  the  meaning 
of  a  word  cannot  be  reduced  to  translation  or  to  "metalinguis- 
tic definition." 

The  meaning  of  a  word  is  a  reflection  of  an  object,  a  phe- 
nomenon, or  a  relation  in  conception  (or  a  mental  formation, 
analogous  in  character,  constructed  from  reflections  of  the  in- 
dividual elements  of  the  activity);  it  enters  the  structure  of  a 
word  as  its  so-called  internal  aspect;  with  respect  to  which  the 
sound  of  a  word  emerges  as  the  material  shell  necessary  not 
only  for  the  expression  of  meaning  and  for  its  communication 
to  other  people  but  also  for  its  appearance,  formulation,  exist- 
ence, and  development.  Therefore,  if,  for  example,  a  person 
blind  from  birth  has  never  seen  chalk,  milk,  snow,  or  any  other 
white  object  in  general,  then  the  meaning  of  the  word  "white" 
will  never  become  fully  manifest  to  him.  The  normal  speaker 
has  never  seen  centaurs  or  unicorns,  but  he  is  not  at  all  surprised 
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that  they  are  defined  in  every  language  by  the  same  words  as  a 
horse  is,  because  in  his  extralinguistic  experience  he  has  had  re- 
vealed to  him  the  attributes  of  the  real  animal  and  so  does  not 
need  a  particularly  extravagant  flight  of  fancy  in  order  to  trans- 
fer them  to  imaginary  beings.  The  meaning  of  a  word  used  in 
a  particular  native  language  to  designate  objects  as  completely 
distinct— from  the  point  of  view  of  Central  European  culture— 
as  "egg,"  "deceased,"  and  "bread"  is  manifest  only  to  one  who 
has  acquired  an  extralinguistic  familiarity  with  these  objects  in 
a  given  cultural  area,  and  who  has  seen  an  oval  form  ascribed 
not  only  to  bread  but  also  to  the  bodies  of  the  dead  at  burial, 
etc,^ 

It  follows  from  the  above  that  translation  is  the  language-re- 
ceiver's creation  of  a  natural  equivalent  to  a  message,  natural 
because  nearest  in  both  meaning  and  style.  In  order  to  attain 
the  most  complete  communication  possible,  one  needs  not  only 
a  high  degree  of  mastery  of  the  linguistic  structures  involved 
but  also  a  deep  penetration  into  the  differences  between  the 
cultures  being  compared.  Hence,  the  concept  of  an  "ethnolin- 
guistic  structure  of  communication."  Therefore,  the  problem 
of  comparing  meanings  in  various  languages  gTows  more  com- 
plex as  the  possibility  of  deriving  the  comparison  from  nonver- 
bal stimuli  decreases— i.e.,  from  nonlinguistic  situations,  and  as 
the  compared  languages  differ  more  from  each  other  with  re- 
spect to  their  culture  and  history.  Under  these  conditions,  it  be- 
comes highly  complicated  to  correlate  meanings  in  the  lan- 
guages, which  are  being  compared  by  means  of  the  extralin- 
guistic situation  and  by  derivation  of  the  meaningful  parts  of 
a  communication.^ 


^It  is  thought  that  the  examples  introduced  by  R.  Jakobson  in  the  article 
cited  in  Chapter  I  deal  with  the  same  matter;  in  order  to  translate  into  English 
prinesi  syru  i  tvorogu  as  "bring  cheese  and  cottage  cheese,"  the  translator  has 
to  refer  to  the  object,  to  extralinguistic  reality;  he  must  imagine  what  certain 
objects  (phenomena,  relations,  etc.)  are  called  in  the  life  of  a  particular  people, 
or  what  he  can  most  conveniently  and  comprehensibly  call  them  if  such  objects 
are  unknown  to  a  particular  people  or  are  uncommon  in  its  everyday  life. 

^  See  Willard  Quine  [24],  pp.  153-154.  For  example,  a  rabbit  runs  by  and  a 
native  says  "gavagai";  on  the  level  of  "empirical"  meaning  there  is  every  reason 
to  correlate  this  "gavagai"  with  "rabbit"  or  "There's  a  rabbit."  But  should 
there  be  a  "terminological"  correspondence  between  gavagai  and  "(There's)  a 
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Unlike  the  "ethnolinguistic,"  the  "properly  linguistic"  or 
"microlinguistic"  structure  of  a  communication  stands  out  most 
sharply  in  the  distinctions  of  word  classifications  (the  noncoin- 
cidence  of  word-classes  and  of  the  generalized  meanings  con- 
nected with  them),  which  are  distinctions  in  the  systems  of 
grammatical  categories,  especially  on  the  level  of  noncoinci- 
dence  of  necessary  grammatical  information.*  But  the  main  dif- 
ficulty lies  in  the  different  relationship  of  abstract  and  concrete 
words  (i.e.,  the  differences  in  the  semantic  structure  of  words), 
the  noncoincidence  of  their  grammatical  spheres,  the  specifics 
of  the  phraseological  bonds,  the  different  relations  of  linguistic 
form  and  semantic  function,  and  so  on.  All  of  these  and  similar 
questions  urgently  demand  the  quickest  possible  solution,  since 
without  it  further  development  of  our  science  is  impossible. 


2.  The  Question  of  Linguistic  Meaning  and  the  Search  for 
Semantic  Universality 

The  definition  of  meaning  (of  words)  cited  above  is  based  on 
a  delimitation  of  the  internal  side  of  language  as  a  specific  lin- 
guistic category  of  the  concept  that  apparently  comprises  the 
category  of  logic. 

Ho^vever,  it  does  not  follow  from  what  has  been  said  that 
logical  categories  in  general  should  not  occupy  the  linguist,  or 
that  he  should  remain  indifferent  to  the  question  regarding  the 
nature  of  conception.  On  the  contrary,  it  is  extremely  impor- 

rabbit"?  That  is,  does  the  utterance  in  the  native  language  mean  the  same  as 
"rabbit"  does  in  English;  or  does  gavagai  refer,  unlike  "rabbit,"  to  any  small 
quadruped  or,  on  the  contrary,  only  to  a  particular  species  of  rabbit,  requiring 
a  lengthy  description  to  designate  it  in  English?  These  are  questions  to  which 
our  attention  is  called  constantly  and  repetitiously  in  general  semeiology  (VVhorf 
and  Sapir  give  interesting  materials  from  a  comparison  of  English  and  North 
American  Indian  languages). 

^  For  this  reason,  the  translation  of  verse  13,  chapter  4,  of  the  Gospel  accord- 
ing to  St.  Matthew  into  Villa  Alta,  for  example,  turned  out  to  be  difficult;  in 
this  particular  dialect  of  the  Zapotec  language  (Southern  Mexico),  the  opposi- 
tion of  completed  versus  repeated  action  is  a  grammatic  necessity,  so  that  it  is 
impossible  to  know  whether  Christ  visited  Capernaum  before  the  event  described 
there  [19]. 
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tant  to  explain  the  relation  of  meaning  and  concept.  But  not 
nearly  enough  has  been  done  as  yet  in  this  area.  Therefore,  the 
problem  now  remains  of  explaining  the  relationship  between 
meaning  and  concept  in  particular  (scientific  concept),  which 
textbooks  on  logic  apparently  attempt  to  do;  one  textbook  de- 
fines concept  as  "an  idea  about  an  object  which  defines  its  es- 
sential characteristics"  (V.  F.  Asmus,  Logika  [Logic],  p.  33^j. 
Until  this  is  done,  it  will  remain  unclear  what  relationship  this 
higher  concept  has  toward  words  and  their  meanings,  i.e.,  to- 
ward the  objectively  existing  units  of  various  languages  as  actu- 
ally spoken.  Indeed,  natural  human  thought  cannot  exist  with- 
out language,  linguistic  terms,  and  phrases;  apparently,  this 
also  remains  true  for  thoughts  that  define  an  object's  essential 
characteristics.^  It  is  here,  in  fact,  that  the  cooperation  of  spe- 


^  Logika,  Gospolitizdat,  Moscow,  1947. 

*  In  treating  these  questions,  it  may  be  useful  to  distinguish  the  "meaning" 
of  a  word  from  its  "sense,"  which  is  what  A.  I.  Smimitsky  does  in  his  above- 
mentioned,  as  yet  unpublished  work:  "The  meaning  of  a  word  quite  frequently 
seems  to  be  not  monolithic  but  structurally  complex  and  therefore  to  distinguish 
components  in  its  own  make-up  that  might  not  correspond  to  any  object  at- 
tributes that  are  separate  in  the  sense  of  the  word,  or  that  might  correspond 
to  them  only  approximately  or  conditionally.  In  other  words,  the  meaning  of 
a  word  might  have  a  complex  composition  and  a  definite  structure  that  may 
or  may  not,  entirely  or  exactly,  be  aspects  of  the  word's  sense,  i.e.,  aspects  of 
the  theoretical  or  practical  concept  expressed  by  the  word.  Thus,  in  the  word 
gromootvod  [lightning  rod,  lit.,  thunder  conductor]  the  components  grom(o)- 
and  -otvod  are  clearly  distinguishable;  consequently,  the  meaning  of  grom- 
somehow  enters  into  the  composition  of  its  semantics,  although  this  semantic 
component  does  not  exist  in  the  sense  of  this  word,  since  a  gromootvod  is 
generally  understood  to  be  an  apparatus  for  "leading  off"  not  thunder  but  an 
electrical  charge— lightning.  The  semantic  component  grom  does  not  enter  into 
the  structure  of  the  concept;  although  the  latter  is  practical  and  is  expressed 
on  a  general  scale  by  the  word  gromootvod,  still  it  exists  in  the  general  seman- 
tics of  the  word. 

"The  semantic  formation  of  a  word  can  sometimes  be  tolerated  by  a  language 
for  a  long  time  (i.e.,  by  the  society  speaking  a  given  language)  in  a  particular 
form,  even  when  it  is  essentially  divergent  from  the  word's  sense,  if  the  word 
can  be  construed  as  a  kind  of  conditionality,  or  as  a  joke,  etc.  But  in  some 
well-known  cases,  depending  on  various  concrete  circumstances,  the  contradic- 
tion between  the  semantic  formation  of  a  word  and  its  sense  leads  to  a  change, 
or  to  the  substitution  of  another  word  not  containing  such  a  contradiction.  In 
contemporary  literature,  the  name  gromootvod  is  more  and  more  frequently 
being    replaced    by    the    word    molnieotvod    pit.,   lightning   conductor],    which 
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cialists  in  various  disciplines— linguistics,  mathematics,  philoso- 
phy (logic),  psychology,  etc.— is  extremely  important.  Much 
attention  has  recently  been  turned  to  such  cooperation.  How- 
ever, in  seeking  new  and  more  complete  means  of  answering 
these  questions  it  is  necessary,  as  we  have  already  said  above,  to 
consider  the  experience  of  previous  work,  primarily  of  the  "in- 
terlinguists,"  i.e.,  of  those  scientists  who,  from  various  positions 
and  by  varying  methods,  sought  possibilities  for  the  creation  of 
a  rationally  constructed,  logical,  and  convenient  international 
auxiliary  language.^ 

As  we  know,  in  many  areas  of  human  communication  there 
have  long  existed  international  semeiotic  systems  recognized  by 
everyone:  The  international  telegraph  code,  the  international 
metric  system  of  weights  and  measures,  musical  notation,  math- 
ematical symbolism,  etc.  The  work  in  progress  on  standardiza- 
tion of  terminology  in  various  areas  of  science  and  technology 
is  constantly  pursuing  this  goal.^ 

In  spite  of  the  success  already  achieved,  however,  many  ques- 
tions connected  with  the  principles  of  the  formation  and  func- 
tion of  auxiliary  languages  still  remain  arguable  and  incom- 
pletely answered.  From  this  fact  there  emerges  a  lack  of  unity 
and  coordination  of  effort  that  is  harmful  to  the  development 
of  international  communication  through  the  medium  of  an  aux- 
iliary language.  As  is  generally  true  of  questions  of  translation, 
these  questions  have  a  general-linguistics  character;  one  might 
say  that  a  survey  of  the  main  questions  regarding  the  formula- 
tion of  auxiliary  languages  is  essentially  the  same  as  a  survey 
of  the  most  crucial  problems  of  linguistics  in  general. 

Rationalization  of  communication  through  auxiliary  interna- 

describes  the  matter  more  accurately.  The  relationship  between  the  semantic 
formation  of  a  word  and  its  sense  is,  as  we  know,  historically  variable;  it 
changes  primarily  in  relation  to  the  development  of  the  sense,  conditioned  by 
the  development  of  a  society  and  its  new  encounters  with  various  aspects  of 
life,  and  also  in  relation  to  the  aspects  of  a  language's  internal  development— 
with  changes  in  its  phonology,  grammatical  structure,  and  vocabulary." 

'  For  a  short  treatment  of  the  history  and  contemporary  status  of  the  ques- 
tion of  an  international  auxiliary  language,  see  [4]. 

*As  we  know,  this  work  was  begun  thirty  years  ago  by  the  International 
Federation  of  National  Associations  for  the  Determination  of  Standards;  the 
Academy  of  Sciences  of  the  U.S.S.R.,  through  its  projects  on  technical  nomen- 
clature, took  a  large  part  in  this  work  (for  materials,  see  [13]). 
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tional  media  in  the  areas  enumerated  above  was  easily  attaina- 
ble because  of  the  complete  determinability  and  autonomy  of 
the  designata.  Thus,  for  example,  a  meter  is  1 /40,000,000th  of 
the  Earth's  meridian;  a  centimeter  is  1/ 100th  of  a  meter,  and 
a  millimeter  is  1/ 1,000th.  The  integral  sign  J  is  a  symbol  rep- 
resenting the  limit  of  a  sum;  n"i=i  is  a  symbol  that  represents 
the  product  of  n  terms;  "dot-dash,"  or  a  group  consisting  of 
one  short  and  one  long  impulse,  is  a  simple  conditional  equiva- 
lent of  the  letter  a,  regardless  of  its  position  in  a  word  or  of 
what  the  word  or  group  of  words  means,  or  even  of  whether 
the  word  means  anything  at  all  (the  Morse  alphabet  can  trans- 
mit with  equal  success  real  words  and  phrases  with  real  mean- 
ing or  meaningless  collections  of  letters).  International  stand- 
ardization of  terminology  is  based  on  exact  definition  of  the 
designated  objects,  and  without  such  a  basis  it  would  be  impos- 
sible and  meaningless.  Difficulties  arise  in  the  standardization 
of  specialized  terminology  whenever  a  choice  has  to  be  made 
among  various  terms  present  in  different  languages;  here,  we 
run  into  questions  of  national  preference,  word-formation  pos- 
sibilities, and  relations  with  other  terms.  But  still,  however  dif- 
ficult it  may  be  to  decide  whether,  for  example,  to  call  a 
certain  substance  "gasoline"  (English),  "essence"  (French), 
"Benzin"  (German),  or  even  something  else,  this  is  not  the 
basic  or  most  important  difficulty.  The  "designatum,"  its  na- 
ture, and  its  properties  are  quite  precisely  and  "autonomously" 
definable.^  Whichever  word  one  may  choose  to  designate  a  par- 
ticular object,  a  precise  and  monolithic  definition  of  that  object 
itself  must  necessarily  precede  such  a  choice. 

Furthermore,  to  apply  this  principle  in  formulating  an  in- 
ternational auxiliary  language  that  would  purport  to  aid  com- 
munication among  the  peoples  of  diverse  and  distant  regions, 
we  need  definitions  of  the  most  widely  divergent  designata, 
and  a  complete  listing  and  classification  of  them,  and  at  the 
same  time  definitions  of  designata  in  the  broadest  possible 
sense— not  just  objects,  actions,  and  attributes,  but  various  con- 


*In  this  connection,  it  is  interesting  to  note  that  all  these  words,  like  many 
modern  technical  terms  in  general,  are  incidentally  "artificial"  or  "contrived" 
words,  for  which  reason  a  choice  of  one  of  them  would  not  be  forced  by 
reference  to  the  "fundamental"  or  "natural"  elements  of  language. 
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tents  of  relations:  direction,  cause,  independence,  and  other 
complex  logical  and  psychological  categories.  Such  attempts 
have  been  made  and  are  constantly  being  made.^°  One  can  see 
how  great  a  value  is  attached  to  this  question  from  the  multi- 
plicity of  different  "semantic"  theories. ^^  However,  most  research 
in  the  latter  direction  has  an  abstract,  philosophical  character, 
and  very  little  to  do  with  the  physical,  linguistic  side  of  the 
question,  or  with  the  factual  peculiarities  of  human  communi- 
cation through  language.  In  their  attempt  to  regulate  human 
communication,  researchers  often  take  a  sharply  a-prioristic  po- 
sition, based  on  abstract  rationalistic  structures  and  not  on 
studies  of  fact.  They  do  not  study  the  possible  ways  to  regulate 
it  semantically  according  to  the  principle  of  the  preservation  of 
mutual  understandings  already  arrived  at  within  natural,  his- 
torically developed  limits. 

In  the  present  publication  it  is  neither  possible  nor  necessary 
to  consider  in  detail  the  condition  of  general  semantics,  or  the 
possibility  in  principle  of  a  complete  inventory  of  all  the  "des- 
ignata"  known  to  modern,  developed  languages  in  the  form  of 
"universals,"  autonomous  with  respect  to  these  languages.  This 
question  is  mentioned  here  only  insofar  as  it  seemed  necessary 
to  state,  on  the  one  hand,  that  the  linguistic  specialists  and  cre- 
ators of  auxiliary  international  languages  (who,  as  we  know, 
were  for  the  most  part  not  linguists)  are  aware  of  the  primary 
importance  of  general-linguistics  research  and,  on  the  other 
hand,  that  the  existing  auxiliary  international  languages  were 
set  up  (developed)  without  any  fundamental,  preparatory  in- 


"  As  an  example,  one  may  refer  to  the  system  of  forty-five  correlative  words 
worked  out  for  systematic  expression  of  the  basic  differences  between  the  con- 
cepts of  quality,  motive,  time,  place,  means,  possession,  the  quantity  of  an 
object,  and  its  individualization.  Note  particularly,  as  a  sample  of  especially 
deep  and  intense  analysis,  the  semantic  research  of  E.  Sapir  performed  for 
the  International  Auxiliary  Language  Association— "Totality"  [27]  and  "Grading" 
[28]. 

"  By  these  we  mean  the  various  forms  of  general  semantics  associated  with 
the  philosophical  semantics  of  Morris  and  Carnap  and,  on  a  more  general 
level,  with  the  logical  positivism  of  Russell  and  Whitehead.  A  very  clear  ex- 
position of  the  bases  of  "general  semantics"  is  presented  in  A.  Korzybsky's  book 
Science  and  Sanity  [16].  In  many  respects,  Korzybsky's  ideas  correspond  with 
the  position  of  the  "science  of  symbolism"  of  Ogden  and  Richards,  formulated 
before  1928  [20]. 
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ventory  of  semantic  categories,  which  is  why  the  work  of  their 
authors,  unlike  that  of  the  international  terminological  com- 
missions, could  not  be  built  on  the  principle  of  a  search  for  the 
most  convenient  and  effective  means  of  designating  some  quite 
distinct,  previously  defined  content.  As  we  know,  this  basic  prob- 
lem of  general  semantics— i.e.,  the  problem  of  separating  out, 
inventorying,  and  classifying  all  the  meanings  and  connotations 
inherent  in  human  languages— is  at  present  not  only  unsolved 
but  is  not  even  being  studied  broadly  and  intensively  enough 
to  obtain  a  clear  perspective  of  its  solution. 

Naturally,  such  a  state  of  affairs  in  the  area  of  general  seman- 
tics (as  the  term  is  used  here)  not  only  continues  to  hinder 
work  on  the  formulation  and  perfection  of  auxiliary  interna- 
tional languages  but  also  is  the  reason  for  the  insufficiencies  and 
inconsistencies  inherent  in  them. 

In  order  to  consolidate  our  remarks  about  the  complexity  of 
the  designatum-designator  relation  in  language,  which  causes 
difficulties  in  the  development  of  a  rational  auxiliary  means  of 
international  communication,  we  shall  present  several  exam- 
ples. 

The  methods  recommended  by  A.  Martinet  in  his  well-known 
article  [17]  for  choosing  roots  for  an  international  European 
auxiliary  language  can  be  illustrated  in  the  choice  of  a  word  for 
the  concept  bashmak  [shoe]  for  la  notion  de  Soulier.  But  what 
is  la  notion  de  Soulier  (conditionally  translated  above  as  the 
concept  bashmak)}  The  Russian  word,  bashmak,  stands  both 
for  footwear  and  for  a  technical  contrivance  (like  the  English 
"shoe"),  but  in  French,  such  a  technical  contrivance  would  be 
called  not  Soulier  but  sabot.  Unlike  Russian,  English  and  Ger- 
man use  their  [Ju]  even  more  broadly— e.g.,  the  English 
"snowshoes,"  varieties  of  lyzha  [skis,  snowshoes].  And  the  Ger- 
man word  for  perchatka  [glove]  is  Handschuh  [lit.,  hand-shoe]. 
On  the  other  hand,  the  basic  Russian  word  for  this  concept  is 
obutf  [footwear]:  u  nee  mnogo  obuvi  [she  has  many  shoes  (of 
various  kinds)];  v  etom  magazine  prodaetsya  obuv'  [they  sell 
shoes  in  this  shop];  obuvnoj  magazin  [shoeshop],  etc.,  while  in 
English  and  German,  for  example,  although  such  generalized 
expressions  as  "footwear,"  Schuhwerk,  etc.,  theoretically  exist, 
their  use  is  in  fact  so  limited  and  specialized  that  for  an  English- 


Linguistic  Meaning  and  Search  for  Semantic  Universality     29 

man  the  natural  "concept"  will  be  "shoes,"  or  Schuhe,  etc., 
rather  than  the  more  collective  equivalent  of  the  Russian  obuv' 
—"footwear,"  Schuhwerk,  etc. 

One  basic  and  quite  properly  separate  principle  in  formulat- 
ing an  auxiliary  international  language  is  economy  of  words— 
they  should  be  so  chosen  as  not  to  hide  the  individual  "concep- 
tual areas,"  but  rather  to  give  all  the  strictly  necessary  mean- 
ings, and  to  teach  users  how  to  do  without  semantic  differentia- 
tions that  are  not  strictly  necessary  for  normal  communication. 
But  realization  of  this  very  proper  principle,  and  actual  appli- 
cation of  it,  force  one  at  every  step  to  deal  with  the  question  of 
what  is  really  necessary.  Which  semantic  distinctions  are 
really  superfluous  or  inessential?  Is  it  essential  or  inessential, 
for  example,  in  speaking  about  ruka  [hand,  arm],  to  distin- 
guish between  and  differentiate  its  two  parts  {main  [hand] 
and  bras  [arm]  )?  And  if  this  is  not  essential,  then  is  it  worth 
while  to  distinguish,  for  example,  plecho  [shoulder]?  Why  not 
simply  call  the  whole  member  by  one  word  and  teach  everyone 
to  think  "economically"  about  the  corresponding  object  (or 
group  of  objects  united  under  one  name)?  Is  it  necessary,  in 
the  auxiliary  language,  to  distinguish  motions  directed  toward 
from  motions  directed  away  from  the  speaker;  are  distinctions 
of  the  type  "go-come"  (or,  correspondingly,  gehen-kommen  or 
aller-venir)  essential  from  the  standpoint  of  economy  and 
choice?  Russian  gets  along  with  the  one  word  idti.  We  [Rus- 
sians—Tr.]  do  not  feel  uncomfortable  when  we  say  idi  syiida 
[come  here]  and  idi  tuda  [go  there];  but  for  an  Englishman  or 
Frenchman  it  would  be  meaningless  to  say  "go  here"  or  vas  ici. 

One  question  concerning  the  internal  form  of  the  designator 
is  closely  related  to  the  problems  just  described,  i.e.,  the  ques- 
tion of  what  internal  structuring  is  most  convenient.  Thus,  for 
example,  the  concept  dom  [home,  house]  {le  concept  de  mai- 
son,  Martinet,  op.  cit.,  same  page)  on  the  general-semantics 
level  is  sufficiently  clearly  defined  as  such  from  the  point  of 
view  of  the  modern  conception  of  European  houses.  But  dom 
is  simultaneously  "building,"  "habitation,"  and  "shelter,"  or 
"refuge,"  etc.  Apparently,  definite  cultural-historical  causes 
forced  the  ancient  Germans,  for  example,  to  refuse  any  variants 
of  the  root  *dem/dom  as  expression  of  this  particular  concept— 
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a  root  not  only  known  in  ancient  German  languages  but  pre- 
served in  the  modern  (compare,  for  example,  German  Zim- 
mer  [room,  chamber]  and  English  "timber,"  etc.).  There  is 
hardly  any  doubt  that  in  choosing  the  most  proper  of  existent 
"semantemes"  for  the  auxiliary  international  language,  it  is  not 
sufficient  merely  to  avoid  misunderstandings  and  accidental 
(homonymic)  correspondences;  careful  research  in  their  seman- 
tics is  also  necessary  from  the  standpoint  of  their  internal  struc- 
ture and  of  their  adaptability  to  the  most  adequate  and  perfect 
expression  of  the  corresponding  concepts. 

These  aspects  can  be  illustrated  most  graphically  by  exam- 
ples of  lexical  morphology  (the  morphology  of  word-forma- 
tion). One  can  hardly  doubt  that  it  is  most  essential  to  the  aux- 
iliary international  language  that  the  greatest  possible  number 
of  words  entering  into  it  be  "motivated,"  that  they  have  a  trans- 
parent structure  and  a  rationally  constructed  relationship  be- 
tween designator  and  designatum;  furthermore,  the  combina- 
tion of  morphemes  composing  them  must  be  monovalent  and 
reversible.^2  But  even  this  completely  clear  and  rational  gen- 
eral rule,  this  correct  general  principle,  is  far  from  simple  to 
realize  in  practice.  Indeed,  why  not  construct  the  entire  system 
of  qualitative  adjectives  on  an  antonymic  basis?  Let  each  se- 
manteme designate  a  given  quality  in  its  "positive"  manifesta- 
tion, and  let  the  absence  of  this  quality,  or  the  presence  of  the 
opposite  quality,  be  designated  by  means  of  a  single,  caritive 
prefix.  This  is  done,  for  example,  in  Esperanto:  facila— easy, 
mal / facila— difficult;  nova— new,  mal / nova— old;  antau—heiore, 
mal I antau—Siiter;  etc.  As  has  already  been  entirely  correctly  in- 
dicated in  the  literature,^^  in  practice  such  a  system  often  leads 
to  obscurity  and  doubt,  especially  if  we  note  that  mal  is  well 

"  Thus,  for  example,  if  solution  of  all  the  complex  problems  dealing  with 
the  selection  of  a  morpheme  from  among  tud-,  worg-,  or  erg-,  org-,  labor-,  etc., 
leads  to  the  choice  of  labor-,  then  labor/ist  should  mean  worker,  labor /ist/ar/o, 
workers  in  general;  labor/emja  should  mean  industrious,  inclined  toward  hard 
work;  labor/ist/al/a  would  be  the  adjective  for  working;  and  anti/ labor /ist/al/a 
would  have  the  corresponding  opposite  meaning.  At  the  same  time,  the  complex 
structure  obtained  should  be  just  as  easily  and  sequentially  subjectable  to  the 
reverse  process  of  expansion  to  the  same  identical  parts  as  were  used  in  the 
synthesis. 

"  For  example,  see  [13]. 
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known  to  be  a  Romance  prefix  having  a  pejorative,  not  just 
caritive,  meaning.  In  this  context,  we  cannot  fail  to  note  yet  an- 
other question:  Is  there  always  an  actual,  real  preference  in  us- 
age for  a  complex  or  clear,  morphologically  sequential,  and  reg- 
ularly productive  word,  as  compared  with  a  simple  word?  For 
example,  ochki  [eyeglasses]  is  called  okulvitroj  in  Esperanto, 
binoklo  in  Ido,  oculvitres  in  Occidental,  lunetes  in  Novial,  and 
perspecillos  in  Interlingua.  Of  course,  if,  abstractly  speaking, 
the  analytic  definition  of  ochki  as  eyeglasses  has  some  particu- 
lar preference,  then,  in  fact,  in  real-language  usage  the  connec- 
tion of  this  concept  with  that  sound,  fixedly  adapted  in  prac- 
tice in  language  communication,  may  be  much  more  effective, 
eliciting  a  conception  of  the  corresponding  object  much  more 
quickly  and  immediately.  Therefore,  if  binoklo^  for  example, 
entered  various  languages  and  took  on  a  positive  "interna- 
tional" character  in  the  existing  natural  languages,  then  would 
it  not  be  simplest  to  introduce  it  into  the  auxiliary  language 
(of  course,  reserving  the  right  for  each  language  using  it  to 
form  any  complex  and  productive  words  as  needed,  on  the  ba- 
sis of  the  principle  of  morphological  monovalence  and  reversi- 
bility)? 

As  we  have  already  seen  in  the  above  example  with  the  prefix 
mal-y  it  is  often  difficult  to  attribute  to  lexical  affixes  the  ab- 
stract character  necessary  for  monovalent  and  reversible  word- 
formation.  For  this  reason,  the  great  interest  that  the  composers 
of  auxiliary  international  languages  show  in  the  phenomenon 
of  conversion  is  quite  natural,  since  conversion  is  a  widely  dis- 
tributed method  of  word-formation  in  which  the  individual 
morphemes,  or  productive  affixes,  alone  do  not  serve  as  the 
word-forming  materials— rather,  the  word's  paradigm  does.  It 
would  seem  that,  unlike  productive  affixes,  the  paradigm,  or 
system  of  grammatic  affixes,  has  the  advantage  of  allowing  one 
to  define  in  communication  such  fully  clear  and  definite  mean- 
ings as  verbality,  substantivity,  attribution,  etc.  But,  actually, 
even  here  the  formulator  of  an  auxiliary  international  language 
runs  into  great  difficulty.  In  the  first  place,  what  does  it  mean 
to  translate  a  certain  root  and  another  paradigm  when,  for  in- 
stance, it  is  necessary  to  obtain  a  verb-meaning  from  a  substan- 
tive root?  In  English,  for  example,  "to  shop"  (and,  correspond- 
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ingly,  (he)  shops,  shopped,  shopping,  etc.)  means  "to  go  into 
a  shop  to  make  a  purchase";  yet,  for  example,  "to  ship"  (ships, 
shipped,  shipping,  etc.)  hardly  means  "to  go  after  something 
on  a  ship,"  but  "to  convey  by  ship";  "to  paper"  (where  "pa- 
per" means  biimaga)  primarily  means  "to  put  up  wallpaper 
with  glue"  [obkleivat'  oboyami]  (although  oboi  is  not  in  gen- 
eral just  paper,  but  a  particular  kind— wallpaper);  "to  chain" 
in  English  means  "to  bind  with  a  chain,"  while  the  French 
verb  chainer  means  "to  measure  with  a  chain"  (mesurer  avec 
la  chaine).  In  English,  the  verb  "to  feather,"  formed  by  a  con- 
version from  the  noun-root  "feather"  [pero]  means  "to  line 
with  feathers"  (with  reference  to  birds'  nests,  and  used  transi- 
tively), while  the  German  federn  has  three  meanings,  different 
from  that  of  the  corresponding  English  verb.  Of  course,  there 
are  correspondences  among  various  languages;  e.g.,  English 
"crown,"  "to  crown,"  French  couronne,  couronner,  and  Ger- 
man Krone,  kronen  are  quite  sufficiently  alike  in  the  morpho- 
logical-semantic respect.  But  word-formation  in  the  auxiliary 
international  language  must  be  monovalent  and  reversible.  It 
must  be  entirely  ideal  and  cannot  tolerate  idiomatic  surprises. 
For  this  reason,  conversion  does  not  help  much  in  solving  the 
problem.  Nor  can  the  question  of  general-meaning  affixes  (i.e., 
general  verb-affixes,  general  noun-affixes,  etc.)  be  considered  en- 
tirely clear.  We  can  illustrate  the  complexity  of  this  question 
with  an  example  from  Ido,  for  which  there  was  originally  pro- 
posed the  application  of  the  suffix  -if  for  the  meaning  "to  pro- 
duce," "to  divide  out";  -ig  for  "to  do,"  "to  make,"  "to  transform"; 
-iz  for  "to  accumulate,"  "to  guarantee";  and  -ag  for  "to  use  as 
an  instrument."  Since  a  practical  application  of  this  rule  turned 
out  to  be  very  difficult,  because  of  the  necessity  of  solving  in 
every  instance  various  problems  of  a  syntactic-semantic  nature, 
such  as  the  transitivity  or  intransitivity  of  verbs,  an  attempt 
was  made  to  substitute  the  single  verbal  suffix  -i  for  all  these 
various  suffixes;  however,  this,  too,  failed  to  attain  wide  distri- 
bution. For  this  reason,  further  work  in  this  area  continues  to 
be  carried  out  for  case  after  case,  without  any  real  results  be- 
ing obtained  that  might  lead  toward  a  fundamental  solution  of 
these  general  problems. 
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CHAPTER  III 


Several  Types  of 
Linguistic  iS/leanings 


It  is  clear  from  the  preceding  exposition  that  one  of  the  most 
imfxjrtant  problems  confronting  linguists  is  the  explanation  of 
what  types  or  varieties  of  meanings  exist  in  language,  and  how 
to  distinguish  them  and  separate  them  from  one  another.  But 
it  is  hardly  possible  to  deal  further  with  questions  of  semantics 
without  perfecting  the  corresponding  metalanguage.  As  the 
first  step,  we  shall  deal  with  the  question  of  so-called  gramma- 
tic,  syntactic,  and  lexical  meanings,  and  in  this  regard  we  shall 
try  to  define  such  widely  used  terms  as  "grammar,"  "syntax," 
and  "morphology,"  although  we  do  not  claim  to  have  the  last 
word  in  defining  these  concepts. 

Everything  expressed  in  language  represents  the  level  of  con- 
tent, or  the  sum,  of  linguistic  meanings.  Linguistic  meanings 
("designations"),  from  the  standpoint  of  just  what  is  being  ex- 
pressed, are  of  two  types: 

(1)  Where  the  designations  are  definable  as  relations  among 
linguistic  elements  (such  as  morphemes,  words,  and  sentences), 
i.e.,  where  some  linguistic  elements  serve  as  symbols  of  rela- 
tions among  other  linguistic  elements,  we  shall  speak  of  syn- 
tactic meanings. 

(2)  In  all  other  cases,  i.e.,  where  the  designations  are  not 
linguistic  relations  but  rather  something  outside  of  language, 
or  where  they  are  some  particular  facts  of  reality  (objects,  ac- 
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tions,  properties,  abstract  concepts,  representations,  etc.),  or  a 
relation  of  utterance  to  actuality,  i.e.,  where  linguistic  elements 
serve  as  symbols  of  something  extralinguistic,  we  shall  speak  of 
lexical  meanings. 

The  concept  of  syntactic  and  nonsyntactic  indicators  (and, 
correspondingly,  of  meanings)  can  be  defined  more  concretely 
as  follows:  Indicators  are  considered  syntactic  when  they  are 
used  only  in  syntactic  analysis  of  a  text,  i.e.,  when  they  are  nec- 
essary only  in  order  to  find  a  governor  for  each  word;  all  other 
indicators  are  considered  to  be  nonsyntactic. 

We  shall  note  further  that  the  name  "lexical"  is  temporary 
for  all  nonsyntactic  meanings;  it  will  suffice  until  we  can  find  a 
better  term.  One  could  call  nonsyntactic  meanings  referential, 
and  then  further  distinguish  lexical  and  some  other  types  of 
meanings  among  them.  But  this  is  a  matter  for  further  study. 

Lexical  and  syntactic  meanings  must  be  expressed  in  all  lan- 
guages (see  E.  Sapir,  Language ^  1921).  This  means  that  in  no 
language  is  an  utterance  without  meaning  if  it  consists  of  ele- 
ments expressing  both  lexical  and  syntactic  meanings.  Here 
such  meanings  are  necessarily  expressed  in  general,  and  not 
specifically  as  being  of  one  or  the  other  type.  In  other  words, 
language  as  a  symbolic  system  demands  the  expression  of  both 
lexical  and  syntactic  meanings  in  every  utterance,  but  it  is  ir- 
relevant for  language  in  general  (and  in  individual  languages) 
just  what  meanings  are  expressed;  this  is  determined  by  the 
content  of  an  utterance,  i.e.,  by  extralinguistic  factors. 

Linguistic  meanings  (designations)  are  distinguishable  from 
yet  another  standpoint.  It  may  be  the  case  that  in  one  language 
several  quite  concrete^  meanings  (perhaps  both  lexical  and  syn- 
tactic) must  be  expressed,  but  not  so  in  another  language. 

The  concrete  meanings  necessarily  expressed  in  a  given  lan- 
guage can  be  called  the  grammatical  meanings  of  that  language. 
Meanings  not  necessarily  or  individually  expressed  in  a  given 
language  may  be  called  the  nongrammatical  meanings  of  that 
language. 

The  statement  that  "grammatical  meanings  in  a  given  lan- 
guage must  be  expressed  in  that  language"  has  the  following 


^  Here,   and   in   what  follows,   the   word   "concrete"  is   used   in   the   sense  of 
"particular,"  "given,"  "just  exactly  this." 
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significance.  For  this  purpose,  meanings  have  a  variety  of  indi- 
cators, one  of  which  must  appear  in  any  utterance  in  which 
there  is  an  element  present  whose  meaning  can  be  joined  (se- 
mantically)  with  a  particular  grammatical  meaning.  Thus,  in 
some  languages  a  word  of  a  particular  class  cannot  be  used  with- 
out indicators  having  corresponding  grammatical  meanings. 
Among  these  indicators  there  may  be  a  zero;  in  that  case  the 
physical  absence  of  an  indicator  is  understood  to  be  just  such  a 
zero  indicator.  Thus,  in  English,  the  meaning  of  number  is 
grammatical,  and  every  noun  must  be  accompanied  by  an  indi- 
cator of  number  (zero— singular;  -5— plural).  In  Chinese,  the 
meaning  of  number  is  nongrammatical;  therefore,  although  a 
noun  may  be  accompanied  by  a  number  indicator  {yige  and 
other  enumeratives  for  the  singular,  men  for  the  plural),  this 
is  not  necessary.  The  absence  of  an  indicator  is  not  taken  to  be 
a  zero-th  indicator,  and  if  in  the  Chinese  noun  the  number  in- 
dicator is  physically  absent,  then  the  meaning  of  number  for 
this  noun  remains  unexpressed.^ 

The  question  of  whether  a  meaning  is  grammatical  often 
leads  to  a  question  about  the  presence  of  a  zero  indicator  among 
the  indicators  of  that  meaning. 

In  other  words,  some  designators  (indicators)  are  optional 
from  the  standpoint  of  a  language  system:  Their  use  is  deter- 
mined by  extralinguistic  factors  (content)  and  their  absence  is 
not  discounted  as  being  a  zero  indicator.  Other  designators  are 
necessary  from  the  point  of  view  of  the  language  itself:  Their 
use  is  determined  by  the  language's  structure  and  their  absence 
is  considered  to  be  an  indicator.  Nongrammatical  indicators 
correspond  to  the  first  type,  grammatical  to  the  second. 

In  practice  it  is  not  always  easy  to  differentiate  between  op- 
tional and  necessary  indicators  (i.e.,  to  determine  the  presence 
of  a  zero  among  the  indicators  of  a  given  meaning),  because 
there  are  many  transitional  cases.  For  each  concrete  meaning 
(and,  correspondingly,  for  its  indicators),  special  study  is 
needed.  However,  this  problem  lies  beyond  the  scope  of  the 


^  "In  Chinese,  as  in  Japanese,  any  noun  can  be  used  with  reference  both  to 
a  real  singular  and  to  a  real  plural  of  an  object;  in  other  words,  it  does  not 
formally  contain  a  specification  of  number  within  itself."  (A.  I.  Ivanov,  E.  D. 
Polivanov,  Grammatika  sovremennogo  kitajskogo  yazyka  [A  Grammar  of  Modern 
Chinese],  1930,  pp.  218-219.) 
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present  chapter;  for  our  purposes  it  is  sufficient  to  believe  that 
we  are  able  in  some  way  or  another  to  distinguish  the  gram- 
matical meanings  in  a  language  from  the  nongrammatical. 

Grammatical  meanings  can  be  both  lexical  and  syntactic.  For 
example,  noun-number  meaning  in  Russian  is  lexical  (the  dis- 
tinction of  nouns  by  number  is  conditioned  by  extralinguistic 
distinctions)  and  grammatical  (since  noun  number  must  be  ex- 
pressed in  Russian).  Likewise,  the  meanings  of  gender,  num- 
ber, and  case  are,  for  Russian  adjectives,  grammatical  and  also 
syntactic  (gender-number-case  distinctions  in  adjectives  are  not 
connected  with  any  extralinguistic  distinctions  but  merely  re- 
flect the  syntactic  bonds  of  the  adjective). 

These  grammatical  meanings  define  the  specifics  of  a  lan- 
guage. The  general  arsenal  of  linguistic  meanings  (i.e.,  what 
can  be  expressed  in  a  language)  is  about  the  same  for  all  lan- 
guages. And  languages  differ  primarily  in  that  one  language 
"prefers"  certain  meanings  and  makes  them  obligatory,  i.e., 
grammaticizes  them,  while  another  language  does  this  with 
other  meanings.  There  may  be  languages  that  do  not  have 
grammatical— i.e.,  concrete,  obligatory— meanings;  this  was  true 
of  ancient  Chinese. 

The  relation  of  the  grammatical,  on  the  one  hand,  and  the 
syntactic  and  lexical,  on  the  other,  can  be  schematized  as  in 
Table  1. 


TABLE 

1 

^~~'"----^^^          Meanings 

Nongrammatical 

Grammatical 

Attributes          ^^^-^^^^^^^ 

Lexical 

Syntactic 

Lexical 

Syntactic 

1 .  Must  this  attribute  be  ex- 
pressed? 

2.  Are  the  expressed  relations 
intraUngual? 

- 

+ 

+ 

+ 
+ 

In  this  regard,  language  theory  can  be  divided  into  lexicol- 
ogy, syntax,  and  grammar.  Lexicology  deals  with  the  expres- 
sion of  extralinguistic  factors,  whereas  syntax  has  to  do  with 
the  expression  of  all  possible  relations  among  linguistic  elements. 
Grammar  occupies  an  intermediate  position  between  lexicology 
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and  syntax;  it  deals  with  both  lexical  and  syntactic  meanings, 
but  only  with  those  which  must  be  expressed  in  a  certain  lan- 
guage (i.e.,  grammatical  meanings). 

The  term  "grammar"  is  applied  here  in  a  narrower  sense 
than  the  generally  accepted  one;  usually,  grammar  is  under- 
stood to  be  not  only  the  study  of  grammatical  meanings  but 
also  the  study  of  the  relations  among  language  elements— syn- 
tax. In  order  to  avoid  ambiguity  and  contradiction  of  the  gen- 
erally accepted  terminology,  the  word  "grammar"  will  be  ap- 
plied in  the  usual,  traditional  sense,  and  the  study  of  gramma- 
tical meanings  will  be  called  "grammar  proper." 

All  that  has  been  said  up  to  now  is  related  only  to  the  char- 
acter of  linguistic  meanings  as  independent  of  the  means  of  ex- 
pressing them.  Now  we  shall  turn  to  these  means,  which  are  of 
two  types,  depending  on  whether  meanings  are  expressed  by 
them  within  the  word  or  not: 

(1)  Morphological,  i.e.,  means  for  the  expression  of  any  nec- 
essary linguistic  meanings  within  the  word.  We  identify  affix- 
ing, alternation,  reduplication,  incorporation,  for  example,  as 
morphological  means. 

(2)  Nonmorphological,  i.e.,  means  for  the  expression  of 
meanings  outside  the  word.  Here  we  identify  the  use  of  aux- 
iliary words,  word-order,  etc. 

(The  quite  complex  question  of  word  boundaries  is  not  con- 
sidered here;  for  the  purposes  of  the  present  study  it  is  suffi- 
cient to  suppose  that  we  can  somehow  define  word-boundaries. 
Specifically,  we  consider— as  in  machine  translation— a  word  to 
be  a  group  of  letters  between  two  spaces.) 

The  difference  between  morphological  and  nonmorphologi- 
cal means  is  schematized  in  Table  2. 


TABLE  2 


Means 


Attribute 


Nonmorphological 


Do  the  given  means  express  some 
meaning  within  the  word? 
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As  we  have  seen,  the  terms  "lexical,"  "syntactic,"  and  "gram- 
matical" are  set  apart  by  two  attributes  and  characterize  mean- 
ings independent  of  the  means  of  expression.  These  terms  re- 
fer to  the  level  of  content. 

The  terms  "morphological"  and  "nonmorphological"  are  set 
apart  by  a  single  attribute  and  characterize  a  means  of  expres- 
sion independent  of  the  expressed  meanings.  These  terms  re- 
late to  the  level  of  expression. 

The  first  and  second  oppositions  lie  on  different  planes.  For 
this  reason,  the  generally  accepted  subdivision  of  linguistic  the- 
ory into  lexicology,  morphology,  and  syntax  is  not  valid  from 
a  terminological  standpoint.  Even  if  we  disregard  the  defini- 
tions proposed  above,  in  traditional  usage  morphology  is  ordi- 
narily understood  to  mean  the  study  of  the  forms  of  words,  i.e., 
of  the  means  of  expression  by  word-formation  (within  the 
word),  while  lexicology  and  syntax  are  the  studies  of  the  corre- 
sponding meanings.  The  use  of  the  word  "morphology"  in  place 
of  "grammar"  can  be  explained  by  the  fact  that  in  those  lan- 
guages from  whose  study  the  terminology  of  modern  linguistics 
was  formulated  (especially  the  Indo-European  languages), 
grammatical  meanings  are  expressed  mainly  by  morphological 
means,  and,  conversely,  morphological  means  are  preferred  in 
these  languages  for  expression  of  properly  grammatical  mean- 
ings. Hence  the  confusion  of  the  terms  "morphology"  and 
"grammar"  (or,  more  precisely,  "grammar  proper"),  the  termi- 
nologically  inexact  expression  "morphological  category,"  and 
other  difficulties. 

Consequently,  it  is  necessary  to  produce  a  distinction  between 
the  types  of  meanings  (lexical  and  syntactic,  grammatical  and 
nongrammatical)  and  the  types  of  expression  of  meanings  (by 
morphological  and  nonmorphologicaP  means).  Using  this  plan 
of  opposition,  one  can  classify  the  facts  of  language;  here  there 
are  eight  groups: 

(1)  Morphological  expression  of  grammatical  lexical  mean- 
ings, e.g.,  indicators  of  number  in  the  nouns  of  French,  Eng- 
lish, Russian,  and  other  languages. 

(2)  Morphological     expression     of     grammatical     syntactic 


^  Nonmorphological   means   are   frequently   called   "analytic"   and    sometimes 
even  "syntactic." 
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meanings,  e.g.,  indicators  of  gender,  number,  and  case  in  Rus- 
sian adjectives;  indicators  of  gender  and  number  in  French  ad- 
jectives. 

(3)  Morphological  expression  of  nongrammatical  lexical 
meanings.  Here  the  incorporation  of  lexemes  in  polysynthetic 
languages,  word-compounding  (German,  Hungarian,  and  other 
languages),  and  also  various  instances  of  word-formation  in 
Indo-European,  Finno-Ugric,  Semitic,  and  other  languages  are 
illustrative.  A  clear  example  of  morphological  expression  of 
nongrammatical  lexical  meanings  is  the  change  in  gender  of 
the  Arabic  verb  or  suffixing  of  pronouns. 

(4)  Morphological  expression  of  nongrammatical  syntactic 
meanings,  e.g.,  the  slit-writing  of  prepositions  with  a  noun  in 
Arabic  {bi^  li,  etc.);  slit-writing  of  the  copula  -a  with  the  nomi- 
nal part  of  a  sentence  in  Georgian;  the  inclusion  of  indicators 
in  a  verb  for  all  its  noun  modifiers  and  conditions  in  Chinook. 

(5)  Nonmorphological  expression  of  grammatical  lexical 
meanings,  e.g.,  articles  and  compound  tenses  in  French,  Eng- 
lish, and  German,  or  indicators— again  separate  words— of  plural 
number,  such  as  mams  and  dag  in  Tibetan. 

(6)  Nonmorphological  expression  of  grammatical  syntactic 
meanings,  e.g.,  the  particle  to  before  an  infinitive  in  English. 

(7)  Nonmorphological  expression  of  nongrammatical  lexical 
meanings.  This  group  includes  the  most  diverse,  quite  ordinary 
cases:  Lexical  meanings  are  expressed  by  individual  words- 
lexemes. 

It  may  seem  that  if  some  lexical  meanings  are  expressed  by 
each  separate  word,  then  they  are  expressed  within  the  word 
itself,  and  one  should  speak  of  morphological  means.  But  this 
is  not  the  case.  We  shall  explain  here,  and  elsewhere  in  this 
book,  what  is  meant  by  an  expression  of  meaning  within  a  word. 
Take  the  word  dver'  [door];  this  word  expresses  several  mean- 
ings. Now,  let  us  join  to  the  meanings  expressed  by  this  word 
the  lexical  meaning  of  otkrytost'  [openness]  (i.e.,  the  door  is 
open).  To  do  this,  we  must  use  another  word  {ptkryta  [open, 
is  open] ),  not  just  some  indicator  within  the  first  word  (which 
we  would  use  if  we  had  to  add  the  meaning  of  plurality— cJi'er? 
[doors] ).  Therefore,  we  speak  of  nonmorphological  means  of 
expressing  the  nongrammatical  lexical  meanings  with  individ- 
ual words— lexemes. 


Several  Types  of  Linguistic  Meanings    43 

(8)  Nonmorphological  expression  of  nongrammatical  syn- 
tactic meanings— conjunctions,  prepositions,  copulas. 

As  stated  above,  we  do  not  claim  the  distinctions  and  defini- 
tions introduced  to  be  final  ones.  They  merely  serve  as  an  il- 
lustration of  how  one  may  work  to  make  linguistic  terminology 
more  exact  and  to  create  a  system  of  exact  concepts  without 
which  the  applications  of  new,  precise  methods  to  the  study  of 
language  are  greatly  hindered  and  sometimes  become  impossi- 
ble. 

Precise  terminology  is  important  for  all  areas  of  linguistics 
and  especially  for  machine  translation,  about  which  more  will 
be  said  in  the  next  chapter. 


CHAPTER  IV 


Nlachine  Translation 
and  Linguistics 


1.  General  Remarks 

Machine  translation  is  a  new  and  fast-developing  area  of  lin- 
guistics in  which  exact  methods  of  research  are  widely  applied; 
indeed,  they  are  necessary  for  progress. 

An  ineluctable  part  of  the  work  in  machine  translation  (MT) 
is  the  description  of  linguistic  facts  but  in  a  unique  form- 
namely,  as  rules  for  transforming  a  text  in  one  language  into  an 
equivalent  text  in  another. 

These  descriptions,  consisting  of  the  iteration  of  necessary 
operations,  are  so  exactly  drawn  up  that  they  can  be  "accepted" 
and  used  by  an  electronic  computer.  Thus,  the  immediate  ba- 
sic factor  occupying  our  attention  is  the  description  of  lan- 
guages by  exact  methods,  which  can  be  verified  experimentally 
byMT. 

To  avoid  occupying  ourselves  with  the  complex  theoretical 
question  of  the  existence  of  a  scientific  description,  we  can  stipu- 
late that  the  construction  of  working  models  is  a  highly  effec- 
tive technique  for  creating  and  verifying  a  description  of  any 
system  whatsoever.  We  shall  explain  just  what  this  means. 

Let  us  assume  that  we  are  considering  a  group  of  arbitrary 
objects  generated  by  a  mechanism  unknown  to  us.  This  mech- 
anism is  not  available  for  immediate  observation,  and  we  can 
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draw  conclusions  about  it  only  from  the  results  of  its  activity, 
i.e.,  from  the  properties  of  the  set  of  objects  that  it  has  gener- 
ated. Here  we  are  interested  in  the  particular  mechanism  only 
in  a  strictly  defined  sense:  It  is  important  for  us  to  know  just 
those  aspects  of  its  functioning  that  cause  it  to  generate  the 
particular  set.  None  of  the  concrete  properties  of  the  mecha- 
nism or  of  its  functioning  are  relevant  to  us. 

By  analyzing  the  totality  of  objects  available  to  us  from  the 
mechanism,  we  can  create  a  hypothetical  description  of  it.  To 
verify  this  description,  we  can  construct  a  model  of  the  mecha- 
nism based  on  it.  This  would  be  only  a  model  and  not  a  copy  of 
the  mechanism,  since  very  many  concrete  properties  of  the 
mechanism  will  not  have  been  studied,  and  in  some  respects 
the  model  will  not  resemble  the  mechanism  itself  at  all.  But  if 
this  model  in  function  can  generate  exactly  the  same  objects  as 
the  mechanism  studied,  then  we  can  conclude  that  our  model 
is  adequate  in  the  relevant  respects  and  consequently  that  our 
description  is  accurate.  (Of  course,  this  description  does  not 
have  to  be  unique;  other  equally  correct  descriptions  are  pos- 
sible.) 

A  model  for  a  generative  mechanism  such  as  that  in  our  ex- 
ample is  directly  relevant  to  linguistics.  Actually,  the  aim  of 
linguistics  is  the  description  of  language,  i.e.,  of  the  system  gen- 
erating speech.  The  system  itself— language— is  not  manifest  to 
the  researcher;  he  must  deal  only  with  the  results  of  its  func- 
tioning—with speech.  To  verify  one's  descriptions,  one  can  cre- 
ate working  models  corresponding  to  them— logical  structures 
that  can  be  realized  in  the  form  of  electronic  circuits  and  that 
must  functionally  generate  speech  objects.  We  think  that  a  de- 
scription can  be  considered  accurate  (although  not  natural)  if 
we  can  create  from  it  a  working  model  capable  of  fulfilling  any 
part  of  the  functions  of  verbal  communication. 

If  the  problem  of  linguistics  is  defined  to  be  the  description 
of  language  as  a  structure  producing  speech,  then  the  aim  of 
MT  is  to  embody  this  description  in  algorithms  that  are  realiz- 
able on  existing  electronic  computers.  By  the  same  token,  MT 
provides  linguistics  with  the  experimental  basis  so  necessary  to 
it;  in  the  course  of  MT,  the  description  of  linguistic  facts  is 
verified  and  made  more  precise,  while  the  methodology  of  lin- 
guistic description  itself  is  perfected.  This  is  the  value  of  MT 
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to  linguistics.  MT  specialists,  in  turn,  should  use  the  language 
descriptions  created  by  linguistics.  Thus,  linguistics  and  MT 
cannot  develop  successfully  without  one  another,  or  without  a 
constant  exchange  of  results  and  accumulated  experience. 

While  this  statement  of  the  interdependence  of  MT  and  lin- 
guistics is  fine  in  theory,  the  actual  situation  is  different.  A 
paradox  has  arisen.  On  the  one  hand,  MT  work  has  received 
significant  comment  in  linguistic  circles.  Special  articles  on  MT 
are  published  in  such  linguistic  journals  as  Voprosy  yazyko- 
znaniya,  Word,  Modern  Language  Forum,  and  Babel.  Problems 
in  MT  are  discussed  at  international  linguistic  congresses  (the 
Eighth  Congress  of  Linguists  in  Oslo,  the  Fourth  Congress  of 
Slavicists  in  Moscow);  moreover,  linguists  take  a  considerable 
part  in  conferences  on  MT  and  related  problems  (the  First 
Conference  on  MT,  1952,  U.S.A.;  the  First  All-Union  Confer- 
ence on  MT,  1958,  Moscow;  the  First  Conference  on  Mathemat- 
ical Linguistics,  1959,  Leningrad,  etc.).  MT  centers  have  been 
established  at  various  linguistic  institutes  and  also  in  univer- 
sities in  several  countries,  such  as  the  U.S.S.R.,  the  Chinese 
People's  Republic,  Czechoslovakia,  the  U.S.A.,  and  England. 

On  the  other  hand,  MT  remains  a  highly  specialized  area 
that  would  seem,  from  its  special  problems  and  methods,  to  be 
quite  separate  from  theoretical  linguistics.  At  present  in  MT 
there  is  made  almost  no  use  of  the  achievements  of  contempo- 
rary linguistics;  whereas  pure  linguists,  while  recognizing  MT 
de  jure,  in  developing  their  theories  completely  ignore  MT  de 
facto. 

Yet  MT  is  not  just  another  special  area  of  linguistics  as  are 
studies  of  Indo-European,  Caucasian,  and  Semitic  languages. 
A  specialist  in  Paleoasiatic  languages  could  easily  know  noth- 
ing about  specialized  research  on  Spanish,  nor  does  a  linguist 
studying  lexicography  need  to  deal  with  the  problem  of  case  in 
Caucasian  languages.  But  MT  touches  equally  on  all  special- 
ized areas.  The  study  of  various  languages  and  problems,  using 
the  approach  and  methods  of  MT  that  have  been  proven  by 
experiment,  will  permit  a  future  unification  of  the  science  of 
language.  MT  is  simultaneously  both  a  workshop,  where  the 
methods  of  precise  linguistic  research  are  perfected  independ- 
ently of  the  concrete  sphere  of  application  of  these  methods,  and 
an  experimental  field,  where  the  results  are  verified  by  ex- 
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perience.  Therefore,  it  is  very  important  for  all  linguists  to  learn 
as  much  as  they  can  about  MT. 

In  the  following  exposition,  our  purpose  is  not  to  describe 
precisely  the  bonds  between  specific  questions  of  MT  and  the 
corresponding  linguistic  problematics;  for  that  we  need  special 
research  not  yet  undertaken.  Our  problem  is  purely  indicative: 
to  give  a  short  description  of  some  of  the  problems  of  MT  that 
seemed  to  us  to  be  of  interest  to  linguists. 

We  have  deliberately  avoided  aspects  that,  though  particu- 
larly important  for  MT,  are  still  too  specific  and  technical  at 
our  present  level  of  development.  For  example,  the  problem  of 
MT  dictionaries  and  of  dictionary  search,  the  problem  of  a 
morphological  analysis  of  words  during  MT,  etc.,  lie  here. 


2.  Two  Approaches  to  Machine  Translation 

The  problems  and  methods  of  MT  are  variously  understood  by 
different  researchers.  Corresponding  to  differences  in  opinion 
in  the  MT  field,  there  are  two  methods  of  approach,  which  in 
foreign  literature  are  sometimes  tentatively  called  the  "95  per 
cent  approach"  and  the  "100  per  cent  approach." 

In  the  first  approach,  the  basic  and  final  purpose  of  research 
is  the  realization  of  machine  translation  of  scientific-technical 
texts  with  the  least  expenditure  of  time  and  effort.  The  quality 
of  the  translation  may  not  be  high;  it  suffices  if  the  greater  part 
of  the  translated  text  (hence,  the  name  "95  per  cent  approach") 
is  understandable  to  a  specialist.  For  this  reason  the  necessity 
of  complete  syntactic  analysis  is  denied;  a  text  is  comprehen- 
sible to  a  specialist  even  with  word-for-word  translation  (at 
least  for  certain  pairs  of  languages).  The  structure  of  the  lan- 
guage does  not  interest  the  researchers;  the  rules  for  transla- 
tion are  based  on  instances  encountered  in  the  texts  analyzed 
and  are  gradually  broadened  by  introduction  of  new  texts  and 
discovery  of  new  cases.  Such  rules  may  not  reflect  actual  lin- 
guistic regularities  and  may  even  contradict  them.  V.  H.  Yngve 
has  called  such  rules  "ad  hoc  rules." 

In  the  second  approach,  the  study  of  the  general  structural 
regularities  of  language  that  form  the  basis  of  concrete  cases  of 
translation  are  put  foremost.  In  other  words,  the  researcher 
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tries  to  explain  the  possibilities  and  means  used  by  language  to 
express  a  particular  thought.  The  rules  for  translation  are  for- 
mulated with  regard  to  the  possibilities  explained.  Realization 
of  a  translation  on  the  machine  is  considered  a  means  of  facili- 
tating knowledge  of  the  structure  of  language  (in  the  sense  in- 
dicated above— as  a  group  of  laws  according  to  which  spoken 
sequences  are  constructed).  This  is  necessary,  since  MT  is  not 
considered  to  be  an  end  in  itself  but  rather  the  first  step  in  solv- 
ing a  more  general  problem:  how  to  "teach"  electronic  compu- 
ters a  whole  series  of  operations  using  speech,  including  editing 
and  referencing  and  the  introduction  of  bibliographic  and  other 
corrections  to  texts. 

Much  attention  has  been  turned  to  syntactic  and,  more  re- 
cently, to  semantic  analysis.  It  is  proposed  that  the  possibility 
of  explaining  the  syntactic  (and  meaning)  structure  of  a  text 
will  allow  us  not  only  to  improve  the  quality  of  machine  trans- 
lations basically  but  also  to  automate  the  operations  mentioned 
earlier  that  are  connected  with  language. 

An  important  place  is  assigned  to  purely  linguistic  studies  of 
language.  Thus,  for  example,  the  MT  group  at  the  Massachu- 
setts Institute  of  Technology  (U.S.A.)  is  working  out  a  special 
structural  grammar  of  German  and  a  parallel,  analogous  gram- 
mar of  English  in  order  to  determine  the  correspondence  be- 
tween these  languages.  "We  are  looking  at  language  in  a  new 
light— through  the  prism  of  the  'memory'  of  a  computer," 
write  two  members  of  this  group,  W.  N.  Locke  and  V.  H. 
Yngve,  "and  we  hope  that  our  work  on  language  structure  will 
yield  us  new  and  interesting  results"  [40]. 

The  "100  per  cent  approach"  demands  that,  although  he  base 
his  work  on  some  limited  text,  the  linguist  use  his  knowledge 
of  a  language  fully  in  formulating  rules  for  translation,  turning 
if  necessary  to  special  studies  (i.e.,  introducing  additional 
texts),  and  that  he  try  to  answer  all  questions  cardinally,  so 
that  his  solution  may  correspond  to  the  structural  possibilities 
of  the  language.  Rules  obtained  in  this  way  can  be  called  "gen- 
eral rules"  (as  opposed  to  the  "ad  hoc  rules"  mentioned  ear- 
lier). 

The  difference  between  "ad  hoc  rules"  and  "general  rules" 
can  be  illustrated  by  an  example  from  an  article  by  A.  Kout- 
soudas  and  A.  Humecky,  "Ambiguity  of  Syntactic  Function  Re- 
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solved  by  Linear  Context"  [38].  In  this  article,  rules  are  given 
for  determining  the  syntactic  function  of  Russian  short-form 
adjectives  in  -o  [legko,  hystro)  and  of  comparative  adjectives 
in  -e,  -ee  (legche,  bystree).  The  rules  are  based  on  a  large  num- 
ber of  examples  (approx.  700).  They  originate  from  analysis 
of  "linear  context"  of  three  words  (the  short-form,  one  preced- 
ing word,  and  one  following)  and  ensure  correct  analysis  of 
nearly  all  initial  examples. 

However,  D.  S.  Worth,  in  his  critique  of  this  article  [62], 
cites  examples  contradicting  nearly  all  of  these  formulated  rules. 
This  is  explained,  as  Worth  says,  by  the  fact  that  Koutsoudas 
and  Humecky  had  not  studied  the  structural  laws  of  Russian 
syntax  but  had  simply  lumped  together  the  results  of  a  series 
of  translations  from  Russian  to  English.  In  this  they  used  su- 
perficial facts— the  character  of  two  adjacent  words.  Study  of  a 
larger  context  would  lead  to  magnification  of  the  number  of 
rules  until  they  enumerated  the  many  individual  instances. 

Worth's  criticisms  are  valid.  The  fault  does  not  lie  in  the 
fact  that  Koutsoudas  and  Humecky  did  not  study  some  exam- 
ples, or  that  they  did  not  have  enough  examples.  If  they  had 
tried  to  find  a  primary  general  solution,  using  only  their  own 
material,  they  would  probably  have  obtained  simpler  and, 
moreover,  more  effective  rules.  Obviously,  "general  rules"  must 
be  based  not  on  three-word  or  even  larger  "linear"  context  but 
on  knowledge  obtained  about  the  whole  sentence  at  the  first 
analysis.  (This  knowledge  is  needed  to  answer  many  questions 
in  translation  and  not  just  to  find  the  syntactic  function  of  the 
short-form  adjective.)  Thus,  if  there  is  an  "obvious"  predicate 
in  a  sentence  (i.e.,  whether  it  be  a  verb  in  the  personal  form, 
or  a  short-form  participle,  or  a  short-form  adjective,  not  neuter 
and  not  compared,  or  a  so-called  "predicative  word"  like  nyet 
[there  is  no],  mozhno  [can],  etc.),  then  the  form  in  -o  (or  -e, 
-ee)  under  consideration  can  only  be  a  modifier  and  must  be 
translated  as  an  adverb.^  We  note  that  it  makes  no  difference  in 
such  an  approach— as  opposed  to  that  taken  by  Koutsoudas  and 
Humecky— where  this  "obvious"  predicate  is  found,  so  no  aux- 


^  For  simplicity  of  illustration  we  have  omitted  the  special  case  of  the  forms 
budet  [will,  will  be],  hylo  [was]  (budet  legko  [(it)  will  be  easy],  bylo  vozmozhno 
[(it)  was  possible],  etc.). 
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iliary  rules  are  needed  for  handling  all  possible  instances  of 
inversion,  substitution,  etc.  Furthermore,  if  there  is  no  "obvi- 
ous" predicate  in  the  sentence  and  the  -o  form  under  consid- 
eration is  an  adjective  of  a  definite  semantic  type  (e.g.,  legko 
[  (it)  is  easy],  estestvenno  [  (it)  is  natural],  nepravil'no  [  (it) 
is  incorrect]  ),  and  there  is  an  infinitive  present  in  the  sentence 
but  no  possible  infinitive-governing  word,  then  the  -o  form  is 
the  predicate  (translated  into  English  by  "it  is"  4-  adj.),  and 
the  infinitive  is  to  be  connected  to  the  -o  form  (e.g.,  legko  videt' 
chto  ...  [it  is  easy  to  see  that  ...]).  Here  again,  the  mutual 
word-order  has  no  significance.  Other  rules  for  finding  the  syn- 
tactic function  of  short-form  adjectives  in  -o  are  formed  anal- 
ogously. 

Such  "general  rules"  are  based  on  a  consideration  of  the 
principal  possibilities  (the  semantic  types  of  the  short-form  ad- 
jectives and  the  presence  or  absence  of  certain  types  of  words 
in  the  sentence).  These  rules  may  be  larger  in  volume  than 
those  of  Koutsoudas  and  Humecky,  but  with  a  little  increase 
in  volume,  they  increase  considerably  in  their  effectiveness.  In 
short,  "general  rules"  can  in  every  case  ensure  a  selection  that 
will  be  comprehensible  (to  a  human  being). 

"General  rules"  are,  of  course,  more  interesting  to  a  linguist. 
In  the  nature  of  things,  their  composition  will  lead  to  an  exact 
description  of  the  structure  of  language,  i.e.,  to  the  discovery 
of  laws  such  as  those  by  which  this  or  that  content  is  expressed 
in  language. 

In  general,  the  "100  per  cent  approach,"  with  its  broad  view 
of  MT,  is  more  closely  related  to  theoretical  linguistics  and  is 
apparently  able  to  function  better  in  solving  the  latter's  basic 
problems. 


3.  Syntactic  Analysis  in  Machine  Translation 

In  the  first  stages  of  MT's  development,  the  researcher's  atten- 
tion was  naturally  drawn  to  the  problems  of  word-for-word 
translation. 

In  word-for-word  translation,  the  machine  ascribes  to  each 
word  or  form  of  a  word  all  possible  translational  equivalents, 
using  a  dictionary  (or  a  dictionary  and  morphological  tables). 
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Linguistic  difficulties  arising  during  such  a  translation  are  not 
great  and  are  almost  entirely  reducible  to  technical  problems. 
Therefore,  it  is  entirely  understandable  that  the  history  of 
practical  work  in  MT  began  precisely  with  word-for-word 
translation. 

During  the  past  five  or  six  years,  in  the  U.S.S.R.,  in  the 
U.S.A.,  and  in  England,  several  experiments  in  word-for-word 
translation  have  been  conducted  using  machines;  e.g.:  Russian- 
English  translation  in  the  Computation  Laboratory  of  Harvard 
University  (Oettinger's  group);  French-English  in  Birkbeck 
College  [23];  French-Russian  at  the  Mathematics  Institute  of 
the  Academy  of  Sciences,  U.S.S.R.  ([6],  [7],  [8]).  (The 
French-Russian  translations  were  not  purely  word-for-word; 
the  algorithm  employed  contextual  analysis  to  distinguish  hom- 
onyms, etc.,  though  not  systematically.)  The  results  have  shown 
that  word-for-word  translation  is  suitable  as  a  first  approxima- 
tion for  definite  pairs  of  languages  and  for  specialized  texts.  In 
some  cases,  it  is  useful  for  direct  application.^  But  even  in  these 
cases,  word-for-word  translation  is  in  need  of  considerable  im- 
provement. 

On  the  other  hand,  for  certain  pairs  of  languages  (e.g.,  Ger- 
man-English and  English-Russian),  word-for-word  translation 
is  generally  impossible;  in  such  cases,  it  is  necessary  to  base  the 
translation  on  a  syntactic  analysis  consisting  of  a  determination 
of  the  bonds  between  words  and  between  parts  of  the  sentence. 

Syntactic  analysis  gives  machine  translation  an  enormous  po- 
tential for  improvement. 

The  truth  of  this  fact  has  long  been  recognized;  one  of  the 
first  publications  on  MT  (in  1951!)  was  Oswald  and  Fletcher's 
remarkable  article  on  the  syntactic  analysis  of  German  text  for 
translation  into  English  [47].  The  authors  had  formulated  sim- 
ple and,  at  the  same  time,  quite  effective  rules  for  automatic 
analysis  of  the  syntactic  structure  of  German  sentences.  Their 
approach  essentially  foreshadowed  the  direction  of  research  in 
this  area. 

In  developing  the  ideas  of  Oswald  and  Fletcher,  Victor  Yngve 
proposed  (in  1955)  an  interesting  methodology  that  yields  a 
very  general  solution  to  the  problem  of  syntactic  analysis  (see 

*  See  examples  of  French-Russian  machine  translation  in  [7]. 
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[63] ).  Immediately  after  Yngve,  there  followed  work  on  vari- 
ous aspects  of  syntactic  analysis  by  many  scientists  abroad  (the 
Cambridge  MT  group  in  England,  the  MT  group  of  The 
RAND  Corporation,  the  collaborators  of  the  Georgetown  group 
in  the  U.S.A.,  and  others),  and  in  the  U.S.S.R.  (the  MT  groups 
at  the  Mathematics  Institute  (MI),  the  Institute  of  Precise 
Mechanics  and  Computer  Techniques  (IPMCT),  the  Linguis- 
tics Institute  (LI),  Leningrad  University  (LU),  and  the  Acad- 
emies of  Science  of  Georgia  and  Armenia). 

We  shall  not  give  a  detailed  description  of  the  activities  of 
each  group  mentioned  but  shall  limit  ourselves  to  a  survey  of 
the  general  state  of  recent  work  on  the  automation  of  syntactic 
analysis,  citing  only  the  most  interesting  aspects. 

We  note  especially  that  in  MT  the  term  "syntactic  analysis" 
is  rather  widely  understood  and  accepted,  though  insufficiently 
defined.  Syntactic  analysis  includes  the  determination  of  bonds 
among  words,  the  determination  of  the  character  of  these  bonds, 
the  hierarchy  of  individual  groups  of  words,  the  relations  among 
the  parts  of  a  complex  sentence,  etc.  Unfortunately,  special  re- 
search that  would  define  the  term  exactly  and  establish  the 
boundaries  of  syntactic  analysis  has  not  been  undertaken  by 
anyone.  We  shall,  therefore,  use  the  words  "syntactic  analysis" 
in  the  usual  broad  and  rather  fuzzy  meaning  (as  primarily  in- 
tending to  determine  the  bonds  among  words  of  a  complex 
sentence). 

Many  researchers  base  syntactic  analysis  on  a  list  of  typical 
phrase-types  (or  constructions).  These  typical  phrases  are  de- 
scribed in  terms  of  previously  defined  classes  of  words.  To  be- 
gin with,  word-class  attributes  are  ascribed  to  all  the  words  in 
a  text  with  the  aid  of  a  special  dictionary.  Then  the  machine, 
comparing  the  text  with  the  list  of  phrase-types  (i.e.,  with  the 
list  of  minimal  word-class  sequences),  finds  specific  phrases  in 
the  text,  and  thus  determines  the  bonds  among  the  words. 

This  method  was  proposed  by  Victor  Yngve  (U.S.A.)  and,  in- 
dependently, by  R.  Richens  (England).  In  the  U.S.S.R.,  T.  N. 
Moloshnaya  was  the  first  to  apply  it  [17]  for  constructing  an 
algorithm  for  English-Russian  translation,  using  a  dictionary 
of  "configurations"  (as  typical  phrase-types  are  called). 

A    dictionary    (list)    of    elementary    syntactic    constructions 
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(about  7,000  entries)  is  applied  in  Harper  and  Hays'  Russian- 
English  algorithm  [35].  Dictionaries  of  configurations  are  ap- 
plied by  the  majority  of  Soviet  researchers  (the  MT  groups  at 
LU,  IPMCT,  LI,  and  the  Georgian  Academy  of  Sciences). 

Several  basic  questions  about  syntactic  analysis,  as  realized  by 
cutting  text  into  the  simplest  typical  phrases,  are  considered  in 
T.  N.  Moloshnaya's  work  [15]  (for  English)  and  in  M.  V.  So- 
fronov's  [20]  (for  Chinese). 

The  application  of  dictionaries  of  configurations  permits  the 
creation  of  a  universal  algorithm  for  syntactic  analysis  suitable 
for  most,  if  not  all,  languages.  Between  languages,  only  the  con- 
tent of  the  configuration  dictionary  changes,  while  its  general 
form  and  the  rules  for  a  search  of  configurations  in  text,  using 
this  dictionary,  remain  the  same.  (Analogously,  rules  for  dic- 
tionary lookup  do  not  change  for  various  languages.)  The  gen- 
eral form  of  a  configuration  dictionary  and  a  corresponding 
universal  algorithm  for  syntactic  analysis  are  being  developed 
at  LI. 

In  order  to  denote  typical  phrases,  it  is  first  necessary  to  clas- 
sify words  in  a  way  that  does  not  correspond  with  the  tradi- 
tional division  into  parts  of  speech.  The  number  of  such  classes, 
in  some  algorithms,  amounts  to  several  dozen  or  even,  in  a  few 
cases,  to  hundreds.  Then,  the  number  of  typical  phrases  be- 
comes several  thousand. 

But  an  approach  is  possible  in  which  a  single,  constant  dis- 
tribution of  words  into  classes  in  general  does  not  obtain.  In 
place  of  one  class  indicator,  a  series  of  indicators  is  written  for 
each  word,  characterizing  all  words  for  all  their  interesting  as- 
pects. Word  groupings  can  be  constructed  using  any  combina- 
tion of  indicators.  When  we  need  to  define  a  class  of  words  in 
order  to  apply  some  rule,  we  indicate  that  class  by  the  necessary 
attributes  and  their  meanings.  Similar  indications  are  included 
in  the  formulation  of  the  rules  (in  the  list  of  configurations); 
thus,  word  classes  are  formed  specifically  for  each  rule.  This 
approach  is  used  by  LI  in  its  algorithm  for  syntactic  analysis. 

This  plan  for  grouping  words  can  be  called  a  "sliding  classi- 
fication." A  "sliding  classification"  is  suitable  wherever  one 
could,  in  choosing  various  combinations  of  indicators,  obtain  a 
large  number  of  word  classes  of  any  volume.  One  can  select  the 
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indicators  so  that  a  class  will  consist  of  just  one  concrete  word 
form;  one  can  also,  by  taking  another  combination  of  indicators, 
construct  a  class  that  includes  a  very  large  number  of  forms. 
The  same  words  can  belong  to  one  class  with  respect  to  one  set 
of  indicators  and  to  another  class  with  respect  to  another  set. 

The  "sliding  classification"  permits  a  considerable  decrease 
in  the  number  of  configurations,  to  several  hundred  instead  of 
several  thousand,  "Sliding  classification"  is  also  of  considerable 
interest  from  a  theoretical  standpoint.  It  is  possible  that  the 
notorious  problem  of  the  parts  of  speech  can  be  studied  anew 
in  the  light  of  a  consistent  development  of  "sliding  classifica- 
tion." 

In  syntactic  analysis,  many  machine  operations,  and  conse- 
quently much  time,  are  spent  searching  the  configuration  dic- 
tionary. Configurations  are  compared  with  the  text  sequentially, 
one  after  the  other,  until  one  of  them  "fits"  the  phrase  being 
analyzed.  Such  iteration  of  configurations  seems  uneconomical, 
and  we  would  like  to  do  away  with  it.  An  alternate  method  has 
been  suggested  by  the  collaborators  of  the  Cambridge  MT 
group  [44]. 

Source-text  elements  that  possess  the  characteristic  of  pre- 
dicting groups  of  a  certain  type  ("structures,"  as  the  Cambridge 
unit  has  decided  to  call  such  groups)  are  studied.  These  ele- 
ments are  called  "operators."  An  "operator"  has  ascribed  to  it, 
in  the  dictionary  itself,  the  number  of  the  structure  that  it  pre- 
dicts and  an  indication  of  its  position  in  this  structure.  Once 
the  machine  encounters  this  "operator"  in  text,  it  immediately 
turns  to  the  proper  structure  (in  a  list  of  structures)  and  then 
searches  the  text  for  the  remaining  elements.  Here,  the  ma- 
chine does  not  have  to  search  the  whole  list  of  structures. 

Similar  ideas  are  being  developed  by  Garvin's  group  (U.S.A.) 
[28].  Here,  special  attention  is  devoted  to  so-called  "decision 
points,"  or  "fulcra." 

A  fulcrum  is  the  element  of  a  syntactic  unit  that  determines 
the  presence  and  character  of  this  unit.  The  fulcrum  of  a  sen- 
tence is  a  predicate  group,  while  the  fulcrum  of  the  predicate  is 
a  verb  in  the  personal  form,  or  a  short-form  adjective  (in  Rus- 
sian), etc.  To  each  fulcrum  correspond  specific  syntactic  rules, 
which  are  only  applied  when  that  fulcrum  is  discovered.  Fulcra 
are  comparable  to  the  operators  of  the  Cambridge  group. 
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In  its  algorithm  for  syntactic  analysis  (of  Russian),  LI  ap- 
plies a  similar  method.  To  each  word  are  ascribed  the  "ad- 
dresses" of  the  first  (in  list  order)  configurations  into  which 
this  word  can  enter.  There  are  two  such  "addresses."  (Ad- 
dresses are  numbers,  the  ordinal  numbers  of  the  configura- 
tions.) 

The  first  "address"  is  ascribed  to  the  word's  root  in  the  dic- 
tionary; it  is  based  on  the  nature  of  the  root  itself  (its  lexical 
meaning,  its  capacity  to  predict  some  word  or  form,  etc.).  The 
second  "address"  is  produced  during  morphological  analysis; 
it  is  based  on  the  form  of  the  word.  Reference  to  the  configura- 
tion list  is  always  made  through  the  "addresses"  of  words.  Pro- 
ceeding from  left  to  right  through  the  phrase  being  analyzed, 
each  word  is  looked  at  in  turn,  and  according  to  its  first  ad- 
dress, a  particular  configuration  is  selected  for  comparison  with 
the  phrase  under  analysis.  For  each  configuration,  "addresses" 
are  indicated  for  the  series  of  configurations  to  which  the  "op- 
erating" word  must  refer,  depending  on  the  results  of  compari- 
son (whether  the  given  configuration  had  "fit"  the  phrase  be- 
ing analyzed).  After  the  comparison,  the  operating  word  is  "re- 
addressed,"  then  the  next  word  is  taken,  and  the  whole  process 
is  repeated  from  the  beginning.  In  this  way,  search  through  the 
whole  list  of  configurations  is  avoided.^ 

The  consequences  of  syntactic  analysis  of  complex  sentences 
are  of  special  interest.  Analysis  can  proceed  by  splitting  up  the 
component  parts  of  a  complex  sentence— simple  sentences,  in- 
dependent elements,  etc.  For  this  purpose,  punctuation  and 
certain  words,  mainly  subordinating,  are  noted  specially.  Syn- 
tactic analysis  proper  (determination  of  bonds  among  words) 
is  conducted  within  each  separate  part.  Oswald  and  Fletcher 
[47]  proposed  this  method;  the  IPMCT  algorithm  [18]  uses  it 
in  analyzing  Russian. 

Yet  another  approach  is  possible:  The  splitting  of  a  sentence 
into  parts  is  not  effected  initially  but  during  determination  of 
the  connections  among  words;  this  splitting  is  not  the  begin- 
ning but  the  end  of  analysis.  This  approach  is  used  in  the  LI 
algorithm.  The  phrase  being  analyzed  is  split  into  "segments" 
according  to  its  punctuation  and  certain  conjunctions  (without 


'  Many  details  have  been  omitted  for  the  sake  of  simplicity  of  presentation. 
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any  special  analysis  of  the  punctuation  or  conjunctions  them- 
selves), so  that  the  segments  do  not  correspond  initially  to  the 
meaningful  parts  of  the  phrase  but  are  purely  formally  sepa- 
rated sections.* 

Syntactic  analysis  is  performed  within  each  section  so  ob- 
tained with  the  aid  of  the  configuration  list.  The  initial  split- 
ting into  segments  is  necessary  to  avoid  forming  false  relations 
between  words  in  one  part  of  a  phrase  and  words  in  another 
part.  However,  this  splitting,  while  saving  us  from  false  corre- 
lations, hinders  the  determination  of  many  true  bonds,  since 
connected  words  can  belong  to  different  segments  at  first. 

Therefore,  when  a  word  is  isolated,  i.e.,  when  there  is  no 
obligatory  bond  to  be  found  for  it  within  a  segment,  then  the 
segment  as  a  whole  takes  on  a  special  designation:  an  indica- 
tion of  what  bond  has  not  been  made  for  which  word.  Thus, 
for  example,  if  a  transitive  verb  (e.g.,  peremeshchaet  [shifts]) 
is  "separated"  from  its  modifiers  (e.g.,  elementy  [elements] ) 
as  follows: 

Segment  I 
"Vse  elementy 
[All  elements]. 

Segment  II 

kotorye  prinadlezhat  A 

[which  belong  to  A], 

Segment  III 

eto  dvizhenie  peremeshchaet  v  novoe  polozhenie  .  . ." 

[this  movement  shifts  to  a  new  position]. 


then  segment  III  will  be  marked  with  an  indication  that  for  its 
third  word  there  is  "missing"  a  substantive  in  the  accusative, 
and  segment  I  will  be  marked  for  the  "absence"  of  a  governing 
word;  i.e.,  there  is  an  "excess"  in  segment  I  of  a  substantive  in 
the  nominative-accusative  case. 


■■Several    weaknesses    (the    periods    in    abbreviations,    etc.)    were    omitted    to 
simplify  the  explanation. 
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The  idea  of  using  such  designations  was  advanced  by  G.  B. 
Chikoidze  (in  Tbilisi).  It  has  proved  fruitful.  In  its  algorithm 
for  analyzing  Russian,  LI  uses  only  some  twenty  such  designa- 
tions, indicating  the  "absence"  or  the  "excess"  of  words  of  a 
particular  type  in  a  segment. 

When  analysis  within  a  segment  is  finished,  segments  are 
compared  with  each  other  for  resultant  designations,  so  that 
the  "excess"  words  in  certain  segments  can  be  connected  with 
the  corresponding  "unsatisfied"  words  in  other  segments.  As  a 
result,  some  of  the  boundaries  between  segments  are  removed 
and  a  primary  unification  of  segments  obtains.  Analysis  is  re- 
peated, if  necessary,  with  respect  to  the  configurations  within 
the  enlarged  segments  and  then  a  comparison  is  made  of  the  seg- 
ments for  their  designations,  etc.,  until  bonds  have  been  estab- 
lished among  all  the  words.  Then,  the  segments  will  correspond 
to  actual  parts  of  the  complex  sentence.  At  this  point,  on  the 
basis  of  a  consideration  of  conjunctions  and  of  knowledge  of  the 
structure  of  each  segment  obtained  during  analysis,  the  bonds 
among  the  segments  and  their  hierarchy  can  be  established. 
Here,  analysis  is  completed. 

The  general  organization  of  the  analysis  is  a  separate  ques- 
tion.^ In  several  projects,  separate  steps  have  been  used  follow- 
ing glossary  lookup,  consisting  of  morphological  analysis,  the 
finding  of  idioms,  resolution  of  homographs,  treatment  of  words 
with  various  peculiarities,  etc.  For  example,  the  French-Rus- 
sian (  [6],  [8]  )  and  Hungarian-Russian  [12]  algorithms  of 
MI  and  LI,  and  the  Georgetown  University  algorithm, 
"SERNA"  ("5  Russkogo  Na  Anglijskij" —bcom.  Russian  to  Eng- 
lish) [59],  are  so  constructed.  During  later  research  it  devel- 
oped, however,  that  the  indicated  stages  are  not  basically  differ- 
ent from  syntactic  analysis.  Actually,  idiom  determination  in 
text  is  the  same  as  the  determination  of  phrase  types,  and  reso- 
lution of  homonyms  is  made  on  the  basis  of  determination  of 
the  bonds  among  words.  For  this  reason,  the  LI  algorithm  for 
syntactic  analysis  of  Russian  text  includes  not  only  idiom  de- 
termination but  also  homonym  resolution  and  treatment  of  spe- 
cial words.  Idioms  and  the  rules  for  resolving  homonyms  are 

"About  MT  analysis,  see  p.  61  below. 
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simply  special  configurations  in  the  general  configuration  list. 
This  unique  approach  has  allowed  a  reduction  of  all  procedures 
to  a  small  group  of  rote  operations,  which  seemed  convenient 
from  the  standpoint  of  constructing  an  algorithm  and  of  pro- 
gramming. 


4.  The  Problem  of  Meaning  in  Machine  Translation 

Since  the  purpose  of  machine  translation,  or  translation  of  any 
kind,  is  transformation  of  text  in  such  a  way  that  its  meaning 
is  preserved  (more  or  less),  work  on  MT  cannot  omit  a  study 
of  meaning  and  the  level  of  content  of  languages.  It  is  some- 
times said  that  MT  banishes  meaning  as  an  object  of  research, 
that  the  machine  cannot  make  use  of  meaning  characteristics. 
These  assertions  are  simply  untrue.  The  machine  can  use  any 
characteristics,  including  those  involving  meaning,  if  only  they 
are  clearly  described  and  enumerated  beforehand.  Isolation  and 
description  of  the  necessary  meaning  characteristics  is,  in  fact, 
one  of  the  most  important  problems  in  MT.  However,  the  ma- 
chine cannot  at  present  actually  make  use  of  the  various  extra- 
linguistic  factors  connected  with  meaning  (the  correlation  of 
language  elements  with  the  objects  of  real  activity,  psychologi- 
cal associations,  etc.),  since  such  questions  have  not  been 
treated.  The  machine  operates  only  with  what  is  immediately 
contained  in  the  text.  Therefore,  a  purely  linguistic  description 
of  meanings  must  be  made  for  MT:  The  meaning  of  an  ele- 
ment is  describable  by  its  substitutability  (how  it  fits  into  syn- 
onymous series  or  into  groups  of  translational  equivalents  in 
various  languages)*^  and  by  its  distribution  (the  appearance  of 
the  element  in  specified  kinds  of  context).  This  approach  is 
not  the  special  property  of  MT;  in  fact,  meaning  must  be  stud- 
ied by  the  same  methods  in  linguistics  as  well.  Here,  of  course, 
the  productivity  of  other  approaches  is  not  denied,  particularly 
the  psychological  approach.  It  is  important  only  to  distinguish 
clearly  the  linguistic  and  nonlinguistic  approaches.  MT  forces 
this  distinction  to  be  made  very  logically. 


'  See  below,  pp.  65-66,  on  the  "thesaurus  method." 
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In  the  light  of  MT  studies,  we  can  consider  anew  such  classic 
linguistic  questions  as  that  of  homonymy  and  synonymy.  Thus, 
from  the  MT  point  of  view,  one  can  speak  of  homonymy  when 
the  same  sequence  of  elements  (e.g.,  letters)  must,  for  the  sake 
of  satisfactory  translation,  be  treated  variously.  The  distinction 
between  homonymy  and  polysemy  is  not  made  at  this  time, 
since  it  makes  no  difference  to  the  machine  at  MT's  present 
stage  of  development  whether  or  not  there  is  any  connection  in 
meaning  between  two  possible  translations  of  a  particular  word. 
Later,  when  we  have  more  complete  systems  of  "semantic  fac- 
tors" (see  p.  66),  this  distinction  will  become  essential,  and  its 
value  will  be  exactly  measured  by  a  group  of  general  "seman- 
tic factors"  constituting  the  meanings  of  the  two  words  com- 
pared. 

Unfortunately,  general  theoretical  questions  connected  with 
research  on  the  meaning  aspect  of  language  for  MT  purposes 
have  not  been  treated  at  all.  For  this  reason,  we  shall  limit  our 
discussion  to  one  of  the  practical  aspects  of  the  broad  theme: 
"meaning  in  MT."  We  have  in  mind  the  problem  of  multi- 
valence. 

Elimination  of  the  multivalence  of  language  elements  (words, 
grammatical  indicators),  in  its  broadest  sense  (including  the 
various  cases  of  homonymy,  see  above),  is  a  basic  problem  of 
MT  in  its  more  general  form.  Multivalence  on  the  MT  level 
means  the  presence  of  several  translations;  the  removal  of  mul- 
tivalence is  the  choice  of  the  necessary  equivalent  from  among 
the  several  possible  ones.  If  multivalence  did  not  exist,  and  the 
machine  did  not  have  to  make  such  a  choice,  then  MT  would 
be  reduced  to  very  simple  transformations. 

The  problem  of  multivalence  of  language  elements  (mainly 
that  of  words)  is  constantly  being  discussed  in  MT  studies. 
Many  suggestions  have  been  made  concerning  automatic  elimi- 
nation of  lexical  multivalence.  They  can  be  grouped  as  fol- 
lows: 

(1)  Limitation  of  multivalence  according  to  subject-matter. 
It  is  proposed  to  apply  special  idioglossaries  in  which  words  are 
given  only  the  meanings  applicable  to  them  within  a  given 
field.  One  could  also  furnish  each  translation  of  a  word  with  a 
code  indicating  the  area  in  which  it  is  applicable. 
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(2)  Reducing  the  number  of  translations  by  choosing  the 
most  general  translations  (i.e.,  those  that  can  be  stretched  to  fit 
all  instances  and  still  not  confuse  the  meaning  of  a  text,  though 
weakening  the  style)  or  the  most  probable  (the  most  frequent) 
translations. 

(3)  Context  analysis.  Interesting  research  by  A.  Kaplan 
[37]  has  shown  experimentally  that  context,  even  when  under- 
stood to  be  simply  adjacent  words,  possesses  considerable 
"force"  for  removing  multivalence.  Obviously,  if  by  the  context 
of  a  multivalent  word  we  mean  "words  immediately  connected 
syntactically  with  the  given  word,"  then  the  "differentiating 
force"  of  such  context  will  be  still  greater.  For  just  this  reason, 
V.  H.  Yngve  proposed  a  solution  of  the  problem  of  lexical  mul- 
tivalence based  on  a  previously  developed  syntactic  structure 
for  the  sentence  being  translated  [64].  This  solution  seems  to 
be  the  most  productive.  First,  the  attributes  of  various  mean- 
ing-categories (object,  person,  action,  condition,  organization, 
etc.)  are  ascribed  to  words;  the  translation  of  the  multivalent 
word  is  chosen  using  rules  indicating  which  of  these  attributes 
in  words  syntactically  connected  with  the  given  word  correspond 
to  the  choice  of  this  translation.  Something  similar  is  done  in 
applying  the  "thesaurus  method"  (see  pp.  65-66). 

A  special  case  of  the  use  of  context  for  removing  multiva- 
lence is  the  discovery  of  idioms  having  a  special  translation. 

(4)  The  most  "powerful,"  but  at  the  same  time  an  extremely 
complex,  means  of  removing  ambiguity  consists  of  giving  the 
machine  so  many  designations  of  meaning  and  the  connections 
among  them  that  it  can  "understand"  the  content  of  a  text  (in 
the  broad  sense  of  the  word).  Then,  besides  syntactic  bonds, 
the  machine  can  in  translating  make  use  of  the  meaning  rela- 
tions—rules showing  the  permissible  combinations  of  semantic 
designations.  Given  such  a  capability,  the  machine  can  correct 
faulty  text  (with  typographical  errors,  omissions,  faults)  by  the 
meaning. 

Special  work  is  being  done  for  transition  to  such  semantic 
analysis  with  the  purpose  of  obtaining  a  sufficiently  full  collec- 
tion of  the  simplest  semantic  elements,  such  that  through  com- 
binations of  these,  one  can  represent  the  meanings  of  any  lan- 
guage units.  Such  elements  have  been  called  "semantic  fac- 
tors" [3].  Semantic  factors  are  necessary  not  only  for  MT  but 
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also  for  many  other  operations  on  text,  especially  referencing 
and  correction,  as  well  as  for  encoding  scientific-technical  in- 
formation to  be  stored  and  operated  upon  by  so-called  informa- 
tion machines. 

Several  groups  are  working  on  extracting  semantic  factors  for 
texts  in  various  fields  of  knowledge.  We  cite  in  particular  J. 
Perry  and  A.  Kent's  group  in  the  U.S.A.,  the  Cambridge  group 
in  England,  and  the  MT  Laboratory  at  the  First  Moscow  State 
Pedagogical  Institute  of  Foreign  Languages. 

We  shall  not  treat  in  detail  the  question  of  a  method  for  ex- 
panding meanings  into  semantic  factors.  Basically,  this  method 
consists  of  defining  semantic  factors  by  determining  the  corre- 
spondences among  the  various  elements  both  within  one  lan- 
guage and  between  languages.  Later,  when  we  discuss  inter- 
lingua and,  in  particular,  the  specification  of  the  semantic  ele- 
ments of  an  interlingua,  we  shall  describe  one  of  the  methods 
applied— the  so-called  "thesaurus  method"  (see  pp.  65-66). 

The  construction  of  sets  of  semantic  factors  is  especially  val- 
uable for  linguistics  because  it  permits  the  study  of  meanings 
as  systems,  i.e.,  as  units  formed  by  definite  rules  from  a  small 
number  of  simpler  elements. 


5.  Interlingua 

The  problem  of  interlingua  for  MT,  formulated  at  an  earlier 
stage  of  MT's  development,  is  frequently  discussed  in  the  lit- 
erature and  in  MT  publications.^  Nevertheless,  it  is  far  from  a 
final  solution;  moreover,  complete  clarity  has  not  as  yet  been 
attained  in  several  general  representations  of  interlingua.  We 
shall  confine  ourselves  to  a  short  resume  of  some  of  the  ideas 
expressed  on  this  subject. 

In  nonliteral  MT  (and  frequently  also  in  word-for-word  MT 
—see  V.  H.  Yngve's  remarks  on  p.  64),  the  translation  process  is 
separated  into  two  stages:  analysis  and  synthesis. 

In  analysis,  specific  data  about  the  text  being  translated  (in- 
formation about  the  translations  of  words,  their  morphological 
forms,  the  connections  among  words,  etc.)  are  extracted  from 


E.  Reifler's  paper  at  the  first  MT  conference,  1952  [51]. 
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it.  These  data  express  the  same  meaning^  as  the  input  text  but 
explicitly  and  unambiguously,  unlike  the  language  elements, 
which  are  connected  with  the  meaning  inexplicitly  and  ambig- 
uously (meaning  may,  for  example,  be  expressed  by  the  rela- 
tive distribution  of  the  language  elements).  The  set  of  data  we 
can  obtain  from  analysis  is  so  arranged  that,  by  referring  to  it, 
we  can  construct  an  output  text.  Constructing  texts  from  an- 
alysis data  is  the  converse  of  analysis  and  is  called  synthesis. 

For  every  language,  data  are  collected  consisting  of  the  char- 
acteristics needed  for  a  unique  and  explicit  expression  of  the 
meaning  of  texts  in  this  language.  These  characteristics  are,  on 
the  one  hand,  the  goal  and  result  of  analysis  and,  on  the  other, 
the  raw  material  for  synthesis.  The  set  of  characteristics  is  de- 
veloped for  a  concrete  language  with  the  introduction  of  its 
grammatical  categories  and  others  necessary  and  convenient  for 
translation  of  the  information.  This  set  is,  in  fact,  the  unique 
"intermediary  language." 

In  binary  translation  (from  one  language  to  another  in  a 
given  direction),  analysis  of  the  input  language  is  performed 
immediately  in  terms  of  the  characteristics  of  the  output;  this 
is  so-called  "dependent  analysis."  For  example,  in  French-Rus- 
sian translation,  the  cases  of  nouns  are  immediately  determined 
during  analysis  of  the  French  text,  since  these  characteristics 
are  needed  for  synthesizing  the  Russian  text. 

But  in  multiple  translation  (from  many  languages  to  many 
others  in  any  direction),  such  an  approach  is  not  very  useful; 
as  many  analysis  algorithms  are  needed  for  each  input  language 
as  there  are  output  languages  proposed  (each  algorithm  leads 
from  the  text  in  the  input  language  to  the  characteristics  of 
one  of  the  output  languages).  Thus,  in  "dependent  analysis" 
ten  languages  would  need  ninety  analysis  algorithms  (nine  "de- 
pendent analyses"  for  every  language)  and  ten  synthesis  algo- 
rithms (since  synthesis  is  always  independent). 

In  order  to  avoid  a  large  number  of  algorithms,  we  can  ap- 
ply "independent  analysis":  For  each  language  there  is  just  one 
analysis  algorithm  leading  from  the  text  in  the  input  language 
to  the  characteristics  of  this  language,  and  one  synthesis  algo- 
rithm performing  the  converse  operation.  In  addition,  there  is 


Or  rather,  almost  the  same;  some  loss  of  information  may  occur. 
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a  set  of  rules  by  which  the  characteristics  of  the  input  language 
derived  from  analysis  are  transformed  into  the  characteristics 
of  the  output  language  needed  for  synthesis.  This  set  of  rules 
is  also  an  interlingua.  For  example,  the  interlingua  of  the  MT 
group  at  M.I.T.   ([63],   [64])  can  be  understood  in  this  way. 

There  exists  yet  another  approach,  as  follows:  After  the  nec- 
essary correspondences  have  been  made  between  the  sets  of 
characteristics  of  concrete  languages,  these  sets  are  united  in  a 
particular  manner  into  one  maximal  set  (macroset)  that  suf- 
fices for  the  unique  expression  of  the  meaning  of  a  text  in  any 
of  the  input  languages.  This  universal  set  of  characteristics  is 
regarded  as  an  interlingua.  Then,  analysis  will  always  lead  di- 
rectly from  the  input  text  to  universal  characteristics,  and  syn- 
thesis begins  immediately  with  these  characteristics.  In  this 
approach,  a  special  stage  of  transformation  (between  analysis 
and  synthesis)  is  apparently  practically  nonexistent,  because  of 
the  inclusion  of  aspects  of  transformation  in  analysis  and  syn- 
thesis. 

The  interlingua,  in  this  sense,  is  nothing  other  than  a  nota- 
tional  system  applicable  for  a  unique,  explicit,  and  sufficiently 
suitable  expression  of  the  meaning  contained  in  texts  in  lan- 
guages subjected  to  translation. 

This  position  is  entirely  in  agreement  with  the  principles  of 
the  "100  per  cent  approach"  to  MT  mentioned  above,  which 
requires  that  translation  be  realized  "by  the  meaning,"  i.e., 
that  the  meaning  be  extracted  from  the  text  being  translated, 
written  in  a  special,  standard  form,  and  then  that  the  output- 
language  text  be  constructed  only  according  to  this  meaning, 
independent  of  the  input  text. 

Before  proceeding  to  the  question  of  the  form  of  an  inter- 
lingua, we  shall  touch,  in  passing,  on  the  necessity  for  an  inter- 
lingua that  has  arisen  in  the  literature. 

The  opponents  of  interlingua  have  indicated  that  its  advan- 
tages (reducing  the  number  of  analysis  algorithms)  can  only 
become  effective  for  a  rather  large  number  of  languages,  while 
for  three  or  four— and  especially  for  only  two— languages,  the 
interlingua  is  not  at  all  necessary,  since  it  yields  little  advan- 
tage in  the  number  of  algorithms  and  complicates  each  of  them. 
However,  as  we  have  said  earlier,  in  binary  translation,  too,  a 
certain   "intermediary  language"   is  applied— e.g.,   the  charac- 
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teristics  of  the  output  text  obtained  from  analysis.  V.  H.  Yngve 
has  shown  that  nearly  all  algorithms  apply  an  "intermediary 
language"  even  if  inexplicitly  and  unconsciously.  For  example, 
in  the  French-English  algorithm  of  Birkbeck  College  (in  Eng- 
land), the  dictionary  is  divided  into  French  and  English  parts. 
Each  French  word  has  stored  with  it  not  its  English  equivalent 
but  only  the  address  of  the  location  of  its  equivalent.  The  set  of 
addresses  in  fact  represents  the  "intermediary"  or  transitional 
language,  as  Yngve  has  called  it.  Such  a  "language"  permits  the 
writing  of  language  information  in  the  machine  in  the  most 
economical  form  and  is  convenient  in  machine  operations. 
Since  these  "intermediary  languages"  exist,  they  must  be  ap- 
plied deliberately.  It  now  becomes  apparent  that  interlingua  is 
necessary  both  in  binary  translation  and  in  multiple,  and 
Yngve's  group  (at  M.I.T.)  is  occupied  with  developing  an  in- 
terlingua for  German-English  translation. 

Of  course,  there  remains  the  purely  terminological  question: 
Should  one  call  just  any  "intermediary"  (transition)  language 
an  interlingua? 

Still  another  argument  is  used  against  interlingua:  Interlin- 
gua, while  decreasing  the  general  number  of  analysis  algorithms 
from  n  +  n  (n  —  1)  to  2n,  i.e.,  in  the  ratio  n^  :2n  —  n/2  (for 
twenty  languages,  a  tenfold  reduction),  seems  to  lead  to  greater 
complexity  of  the  algorithms.  But  this  assertion  is  rather  in- 
definite, for  there  does  not  exist  at  present  a  way  of  evaluating 
the  "complexity"  or  the  "simplicity"  of  algorithms.  Moreover, 
no  one  has  yet  compared  algorithms  constructed  in  conjunction 
with  an  interlingua  with  algorithms  in  which  interlingua  is  not 
used  at  all  (if  the  latter,  in  fact,  exists;  see  above). 

At  present,  the  need  for  interlingua  as  such  is  recognized  by 
all  groups  in  the  U.S.S.R.,  by  the  researchers  in  the  Cambridge 
group  in  England,  by  V.  H.  Yngve's  group  in  the  U.S.A.,  and 
by  others.  However,  the  form  of  the  interlingua  is  not  as  yet 
decided  upon. 

In  the  literature  four  types  of  interlinguas  are  discussed: 

(1)  One  of  the  natural  languages  may  be  used  as  an  inter- 
lingua (e.g.,  the  language  of  the  country  in  which  particular 
MT  algorithms  are  being  created).  But  since  the  interlingua 
must  ensure  a  monovalent,  explicit,  and  maximally  economical 
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notation  for  meaning  extracted  from  the  input  text,  and  no 
natural  language  satisfies  these  requirements,  this  method  ap- 
parently is  not  being  followed  consistently  by  anyone  in  prac- 
tice. 

(2)  The  interlingua  may  consist  of  a  standardized  and  sim- 
plified natural  language.  An  example  of  this  is  the  "Model  Eng- 
lish" proposed  by  Stuart  C.  Dodd  [41]. 

(3)  The  interlingua  may  be  one  of  the  artificial  interna- 
tional languages,  such  as  Esperanto  or  Interlingua.  The  use  of 
Interlingua  as  an  interlingua  has  been  studied  by  A.  Gode  [31]. 

(4)  However,  a  method  more  likely  to  be  useful  is  the  crea- 
tion of  specially  adapted  artificial  languages  for  MT.  Pioneer 
groups  dealing  directly  with  the  problem  of  interlingua  (at 
Cambridge,  at  Leningrad  University,  and  at  the  MT  Federa- 
tion in  Moscow)  have  all  come  to  the  same  conclusion:  con- 
struction of  an  interlingua  as  a  system  of  correspondences 
among  natural  languages  (for  simplicity  in  presentation,  we 
shall  not  touch  upon  the  differences  existing  among  the  ap- 
proaches used  by  the  groups  mentioned).  This  viewpoint  is 
most  fully  presented  in  the  publications  of  the  Cambridge  group 
in  presenting  the  so-called  "thesaurus  method"   (  [42],   [43] ). 

A  thesaurus  is  a  particular  kind  of  dictionary  in  which  words 
are  grouped  into  thematic  classes  that  are  divided  into  sections 
and,  further,  into  categories.^  In  the  most  famous  dictionary  of 
this  'kind.— Ro get's  International  Thesaurus  of  English  Words 
and  Phrases— there  are  six  classes,  twenty-four  sections,  and  more 
than  1,000  categories.  For  example:  The  class  "Space"  includes 
the  sections  "General,"  "Measurement,"  "Form,"  "Motion"; 
the  section  "Motion"  is  divided  into  the  categories  "Change  of 
Position,"  "Rest,"  "Land  Travel,"  "Flying  (air  travel),"  "Trav- 
eller," "Sailor,"  "Aeronaut,"  etc.  In  addition  to  being  joined 
into  thematic  groups,  the  words  are  listed  alphabetically,  and 
each  is  assigned  numbers  (or  headings,  called  "heads")  for  the 
thematic  groups  to  which  it  belongs. 

A  word  can  belong  simultaneously  to  several  groups,  as  in  the 
case  of  homonyms  ("rock,"  as  skala  [cliff,  crag],  or  as  kachat' 


*  The   term   "thesaurus"   is   also   used   to   refer   to   dictionaries   in   which   the 
lexical  system  of  a  language  is  presented  quite  thoroughly. 
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[to  rock] ),  or  in   the  case  of  polysemy   ("rod,"   as  sterzhen' 
[stirring  rod],  or  as  rozga  [birch  rod] ). 
The  entry  for  the  word  "flat"  from  Roget's  Thesaurus  is: 

flat  172— inertia 

191— story,  level 
207-low 
213— horizontal 
223-color 
etc. 

In  other  words,  every  word  has  assigned  to  it  series  of  syn- 
onyms with  which  it  is  associated  (in  various  meanings);  a  ser- 
ies of  synonyms  (or  rather,  the  group  nearest  to  the  word  in 
meaning)  forms  a  thematic  category. 

Thesauri  can  also  be  interlingual.  In  that  case,  groups  of 
words  from  several  languages,  similar  in  meaning,  are  joined 
into  the  same  thematic  category. 

Translation  of  lexical  content  is  done  in  two  stages  when  an 
interlingual  thesaurus  is  used: 

(1)  There  may  be  several  thematic-category  numbers  with 
the  word  to  be  translated,  and  the  necessary  number  (that  most 
suitable  in  the  given  context)  is  chosen  first;  for  this  purpose, 
sets  of  such  numbers  are  compared  for  syntactically  connected 
words,  and  common  numbers  are  selected. 

(2)  All  words  in  the  output  language  that  are  near  in  mean- 
ing, and  might  in  a  particular  context  be  the  equivalents  of  a 
given  word,  are  pulled  according  to  their  thematic-category 
number.  The  choice  of  the  proper  equivalent  from  among  sev- 
eral possible  ones  is  made  according  to  special  rules  belonging 
entirely  to  the  output  language. 

In  the  specially  constructed  thesaurus,  where  groups  of  words 
in  various  languages  are,  taken  as  a  whole,  mutually  and 
uniquely  related  to  one  another,  thematic-category  numbers  may 
be  thought  of  as  the  words  of  an  interlingua. 

The  relations  among  semantic  elements  (words)  in  the  in- 
terlingua can  be  expressed  by  the  same  indexes,  with  symbols 
for  the  related  elements  [55],  or  with  parentheses  grouping 
pairs  of  elements— the  defining  and  the  defined— so  that  a  pair 
included  in  parentheses  may  be  thought  of  as  a  single  element 
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[49].  The  interlingua  of  the  Cambridge  group  does  not  have 
grammar  in  the  general  sense  (number,  case,  tense). 

We  shall  not  describe  in  detail  the  approaches  of  the  Lenin- 
grad and  Moscow  groups  to  the  problem  of  interlingua  but 
shall  refer  the  reader  to  the  relevant  publications:  [1],  [2], 
[3],  [13].  We  shall  only  note  that  workers  in  the  Leningrad 
group  have  already  obtained  practical  results.  They  have  de- 
veloped an  experimental  version  of  interlingua  for  a  series  of 
natural  languages  (Russian,  Czech,  English,  Indonesian,  and 
others),  and  soon  an  experimental  machine  translation  should 
be  realized  from  any  one  of  these  languages  to  another,  using 
interlingua.  Along  with  the  interlingua  created  by  determina- 
tion of  the  correspondences  among  natural  languages,  another 
type  of  interlingua  is  possible:  purely  logical,  developed  from 
analysis  of  the  content  of  some  science  but  without  introduc- 
tion of  data  from  natural  languages.  Apparently,  the  members 
of  Perry  and  Kent's  group  in  the  U.S.A.  and  of  the  Electromod- 
eling  Laboratory  of  VINITI"  in  the  U.S.S.R.  are  following  this 
method. 


6.  Formalization  of  Algorithm  Notation 

In  conjunction  with  the  problem  of  interlingua,  much  atten- 
tion has  been  drawn  to  the  question  of  a  specialized  "language" 
for  MT  algorithm  notation.  Because  such  a  "language"  per- 
mits a  generally  known  standardization  of  algorithms,  it  sim- 
plifies their  construction  and  control  and,  most  of  all,  essen- 
tially simplifies  their  programming  by  permitting  a  transition 
to  automatic  programming.  Formal  notation  for  algorithms  pre- 
supposes the  use  of  a  small  number  of  precisely  defined  expres- 
sions (commands,  questions,  etc.).  A  standard  program  is  made 
for  the  realization  of  each  such  expression.  Then,  since  all  ex- 
pressions have  a  standard  form,  the  machine  can  decipher  these 
expressions  and  replace  them  with  the  corresponding  programs. 
In  other  words,  automatic  programming  is  nothing  other  than 
a  machine   translation  of  the  MT  algorithm  itself  from  the 

^''  [VINITI  =  All-Union   Institute   of  Scientific  and   Technical   Information.— 
Tr.] 
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language  in  which  it  was  written  by  the  analyst  to  the  internal 
language  (the  so-called  "order  code")  of  a  particular  machine. 
Naturally,  the  more  standardized  and  regular  the  initial  nota- 
tion of  the  algorithm,  the  more  simply  the  corresponding  trans- 
lation is  realized. 

Several  MT  groups  apply  a  logico-mathematical  symbolism 
as  algorithm  notation  for  finding  predicates,  augmented  by  a 
series  of  conditional  designations  (the  Harvard  and  George- 
town groups  in  the  U.S.A.).  A  special  symbolic  language,  which 
includes  designations  of  language  elements  and  of  the  opera- 
tions being  performed,  has  been  developed  by  the  Leningrad 
group.  This  language  has  been  proved  in  practice— for  writing 
several  algorithms  (11). 

The  language  presented  by  Yngve  for  writing  algorithms 
(his  programming  language)— COMIT  [65]— has  still  another 
structure.  The  essence  of  Yngve's  idea  is  that  a  single  standard 
form  is  used  for  writing  the  rules  composing  the  program. 
Each  rule  has  five  parts.  The  number  of  the  rule  is  written  in 
part  I,  and  part  V  contains  the  number  of  the  rule  to  which  to 
proceed  after  carrying  out  the  operations  required  by  the  pres- 
ent rule.  In  part  II  are  indicated  the  elements  (words,  parts  of 
words,  etc.)  or  attributes  on  which  to  perform  the  operation; 
what  is  to  be  done  with  these  elements  or  attributes  (substitu- 
tion, erasure,  or  addition  of  elements;  ascription  or  erasure  of 
attributes;  etc.)  is  shown  in  part  III.  Part  IV  defines  the  bound- 
ary of  the  algorithm  to  which  the  particular  rule  applies,  and 
sometimes  contains  an  indication  about  a  transition  to  this  or 
that  subrule  of  the  rule  (this  indication  to  be  used  by  a  special 
part  of  the  algorithm,  called  the  "dispatcher"). 

COMIT  is  used  by  the  MT  group  at  M.I.T.  for  writing  algo- 
rithms, in  particular,  a  German-English  algorithm.  COMIT  is 
also  beginning  to  be  applied  by  several  other  groups  in  the 
U.S.A. 

The  so-called  "operator  notation"  for  MT  algorithms  devel- 
oped by  O.  S.  Kulagina  (  [4],  [5] ),  in  addition  to  introduc- 
ing a  standard  form  of  rules,  contains  a  whole  list  of  allowable 
operations— operators.  An  operator  is  a  small  algorithm  han- 
dling one  precisely  specified  part  of  a  problem:  e.g.,  verifying 
the  presence  of  an  attribute,  noting  an  attribute,  searching  for 
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words  with  particular  attributes.  The  operator  has  a  fixed  in- 
ternal structure  but  variable  parameters;  thus,  one  and  the 
same  operator  can,  for  example,  verify  the  presence  of  various 
attributes  for  various  words.  Kulagina's  operators  are  like  stand- 
ard details  [i.e.,  components]  from  which  the  MT  algorithm 
is  formulated. 

On  the  basis  of  the  analytic  part  of  the  French-Russian  algo- 
rithm (  [6],  [8]  ),  Kulagina  selected  seventeen  operators:  three 
different  verification  operators,  two  different  search  operators, 
an  erasure  operator,  an  operator  for  inserting  words,  etc.  These 
operators  are  not  all  bound  to  the  specifics  of  the  French  lan- 
guage and  can  be  applied  in  algorithms  for  a  number  of  other 
languages. 

Thanks  to  the  application  of  operators,  the  logical  structure 
of  the  algorithms  becomes  quite  explicit,  and  their  construc- 
tion is  thus  simplified.  Operator  notation  permits  a  transition 
to  the  automatic  programming  of  algorithms.  Kulagina  has  per- 
formed an  experiment  in  automatic  programming  of  part  of 
the  Hungarian-Russian  algorithm  [12];  in  five  minutes,  the 
machine  constructed  five  programs  that  would  have  taken 
twenty  to  thirty  man-days. 

The  idea  of  operator  notation  seems  highly  productive;  at 
present,  and  as  a  continuation  of  Kulagina's  work,  a  compila- 
tion of  so-called  algorithm  operators  is  being  made  [14].  Op- 
erators connected  with  programming  technique,  with  peculiari- 
ties of  realization,  are  excluded  from  this  compilation,  and  new 
operators  resulting  from  the  creation  of  a  more  complex  type 
of  algorithm  are  introduced. 


7.  The  Interaction  of  Man  and  Machine  during  MT 

This  question  has  many  interesting  facets  of  which  we  shall 
mention  several  here. 

Man  can  participate  in  the  process  of  MT  either  by  initially 
preparing  the  text  to  aid  the  machine  in  handling  multiva- 
lence,  etc.  (pre-editing),  or  by  the  necessary  polishing  of  the 
rough  translation  made  by  the  machine  (postediting).  The 
question  of  the  usefulness  of  pre-  or  postediting  (or  of  both) 
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still  remains  unsolved.  Most  researchers  are  inclined  to  prefer 
postediting,  though  there  are  no  exact  figures  on  this.  Evidently, 
Y,  Bar-Hillel  was  right  [22]  in  emphasizing  the  importance  of 
pre-  or  postediting  and  in  indicating  that,  since  high-quality, 
fully  automatic  translation  is  not  at  first  achievable,  it  would 
be  desirable  to  organize  an  intelligent  interaction  between  man 
and  machine  and  to  arrive  as  quickly  as  possible  at  partially 
automatic  mass  translation.  This  would  permit  the  accumula- 
tion of  valuable  experience  for  the  further  development  of  ma- 
chine translation. 

Electronic  computers  can  be  successfully  applied  to  assist  hu- 
mans in  varied  research  on  language.  During  the  1950's,  sev- 
eral experiments  were  conducted  in  which  the  machine  helped 
to  produce,  with  minimal  expenditure  of  time  and  effort,  list- 
ings ("concordances")  of  large  quantities  of  text:  of  the  Bible, 
of  the  preachings  of  Thomas  Aquinas,  of  the  Dead  Sea  Scrolls, 
etc.  (see  papers  by  Cook  [25],  Tasman  [57],  and  Ellison). 

All  of  these  experiments  demonstrated  the  usefulness  of  com- 
puters in  various  kinds  of  lexicographic  work  (extracting  dic- 
tionary materials  from  text,  sorting  these  materials,  etc.),  and 
for  all  sorts  of  statistical  counts:  machine-aided  calculation  of 
the  frequencies  of  letters  and  morphemes,  words,  and  even  syn- 
tactic constructions;  thus,  the  National  Bureau  of  Standards 
produced  a  frequency  count  for  various  kinds  of  syntactic  con- 
structions for  English  using  the  SEAC  [56].  Such  application 
of  machines  has  great  value  not  only  for  MT  but  also  for  lin- 
guistics as  a  whole. 

Experiments  involving  "learning  machines"  are  especially  in- 
teresting; "learning"  is  used  here  in  its  broadest  conditional 
sense.  The  simplest  such  experiment  involves  a  machine's  com- 
pleting its  own  dictionary  independently  during  the  transla- 
tion. A  word  in  the  text  to  be  translated,  but  not  in  the  diction- 
ary, is  pulled  from  the  text  along  with  the  defining  context 
and  an  indication  of  its  text  location;  then  it  is  placed  in  the 
dictionary  in  alphabetic  order.  A  man  then  writes  the  necessary 
dictionary  information  (in  the  MT  groups  of  Harvard  Univer- 
sity, U.S.A.,  and  at  Birkbeck  College,  England). 

In  the  MT  studies  being  conducted  by  the  group  at  The 
RAND  Corporation  (U.S.A.),  the  machine  is  expanding  the 
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list  of  elementary  syntactic  constructions  available  to  it.^^  Se- 
quences of  words  not  corresponding  to  any  in  the  list  are 
printed  out  by  the  machine  along  with  their  text  location  and 
are  classified  by  specific  characteristics  for  later  study  by  lin- 
guists. 

We  should  note  experiments  in  applying  the  machine  for 
automation  and  even  for  automatically  writing  MT  algorithms. 
For  example,  a  plan  developed  at  the  Computation  Laboratory 
of  Harvard  University  is  as  follows.  A  word-for-word  Russian- 
English  translation  is  made  with  the  aid  of  the  machine.  This 
translation  is  corrected  by  a  posteditor  using  special  instruc- 
tions prescribing  definite  actions  and  the  writing  of  changes  in- 
troduced in  a  standard  form.  The  postedited  translation  is 
again  input  to  the  machine,  which  compares  it  with  the  initial 
(word-for-word)  translation,  discovers  the  differences,  collects 
and  classifies  them,  and  then,  on  the  basis  of  an  analysis  of  these 
differences,  constructs  an  algorithm  capable  of  introducing  into 
the  word-for-word  translation  the  same  changes  that  had  been 
written  in  by  the  posteditor.  This  algorithm  is  included  in  the 
initial  stage  of  the  translation,  and  initial  translation  improves. 
Now  the  posteditor  receives  something  better  than  a  word-for- 
word  translation.  Once  again  he  corrects  the  text,  which  is  again 
input  to  the  machine,  and  the  cycle  is  repeated  until  the  qual- 
ity of  translations  output  by  the  machine  satisfies  the  posteditor. 
Thus  the  machine  is  able,  as  it  were,  to  "learn"  by  analyzing 
and  imitating  the  actions  of  the  posteditor  ([30],  [36],  [45]). 


8.  Some  Facts  about  Work  in  MT 

In  the  preceding  sections  no  exhaustive  characterization  of  all 
the  basic  problems  of  MT  is  to  be  found.  These  sections  are 
meant  only  to  give  the  reader  a  general  idea  of  the  state  of  ma- 
chine translation. 

Machine  translation  is  a  little  over  ten  years  old.  The  idea  of 
mechanizing  translation  from  one  language  to  another  was  ex- 
pressed by  the  Soviet  inventor  P.  P.  Troyansky  as  far  back  as 


"  [In  fact,  the  machine  has  not  done  more  than  aid  in  the  expansion.— Tr.] 
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1933;  in  that  year  Troyansky  obtained  a  patent  for  his  translat- 
ing machine.^^  But  at  that  time  Troyansky's  ideas  did  not  re- 
ceive the  necessary  development.  After  the  invention  of  high- 
speed electronic  computers,  the  idea  of  mechanizing  transla- 
tion with  their  aid  arose  once  again  (1946,  Weaver  and  Booth); 
in  1949,  the  first  research  was  begun  (in  the  U.S.A.).  In  1952, 
the  Massachusetts  Institute  of  Technology  called  the  First  Con- 
ference on  MT,  and  from  then  on  the  number  of  publications 
dedicated  to  MT  questions  has  risen  steadily.  In  the  beginning 
of  1954,  IBM  conducted  an  experiment  in  Russian-English 
translation  on  the  IBM  701.  Thus,  the  possibility  of  MT  was 
proven  in  practice.  In  the  U.S.S.R.,  work  on  MT  began  in  1954, 
and  in  1956,  English-Russian  and  French-Russian  translations 
were  realized.  Since  1955,  more  new  groups  have  joined  in  MT 
research.  The  scope  of  the  work  has  been  increasing  steadily. 

At  present,  machine  translation  is  being  pursued  in  the  fol- 
lowing countries:  the  U.S.S.R.,  the  U.S.A.,  England,  Japan, 
China,  Czechoslovakia,  Italy,  France,  Sweden,  Israel,  Mexico, 
and  India.  Only  the  U.S.A.  has  more  than  ten  groups  partici- 
pating. These  groups  are  concentrated  in  the  larger  research 
centers,  such  as  the  universities— Harvard,  Georgetown,  Wash- 
ington, Chicago,  M.I.T.,  and  others;  and  in  corporations— 
RAND  and  Ramo-Wooldridge;  etc.  The  largest  of  the  groups 
includes  dozens  of  workers.  There  are  two  groups  at  work  in 
England  (Birkbeck  College  and  Cambridge  University^^).  In  the 
U.S.S.R.,  five  groups  in  Moscow  are  working  on  MT  and  re- 
lated problems  (at  the  Pedagogical  Institute  of  Foreign  Lan- 
guages and  at  four  institutes  of  the  Academy  of  Sciences:  the 
Mathematics  Institute,  the  Institute  of  Precise  Mechanics  and 
Computer  Techniques,  the  Electromodeling  Laboratory  of 
VINITI,  and  the  Institute  of  Linguistics);  and  there  is  one 
group  in  each  of  five  other  cities:  Leningrad  (Leningrad  Uni- 
versity), Kiev  (Kiev  University  and  the  Computational  Cen- 


"See  the  brochure  "Perevodnaya  mashina  P.  P.  Troyanskogo"  ["The  Trans- 
lation Machine  of  P.  P.  Troyansky"],  published  in  1959  by  the  Izd-vo  Akademii 
nauk  SSSR,  Moscow,  pp.  1-40  (translated  in  JPRS  3532,  U.S.  Joint  Publications 
Research  Service,  July,  1960,  pp.  1-39). 

^^  [The  Cambridge  Language  Research  Unit  is  actually  independent  of  the 
University.— Tr.] 
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ter  of  the  Academy  of  Sciences),  Erevan  (Computational  Cen- 
ter of  the  Armenian  S.S.R.),  Tbilisi  (Institute  of  Automatics 
and  Telemechanics  of  the  Georgian  Academy  of  Sciences),  and 
Gorky  (Radiophysical  Institute). 

In  the  U.S.A.  and  in  the  U.S.S.R.,  special  journals  are  pub- 
lished on  MT:  Mechanical  Translation  (M.I.T.)  and  Mashinnyj 
perevod  i  prikladnaya  lingvistika  [Machine  Translation  and 
Applied  Linguistics]  (Moscow  Institute  of  Foreign  Languages). 

The  group  of  languages  being  machine  translated  has  greatly 
increased.  Whereas  attention  at  first  was  primarily  concentrated 
on  Russian  and  English,  work  is  now  being  conducted  on  MT 
in  the  following  languages  as  well:  French,  German,  Italian, 
Chinese,  Hindi,  Japanese,  Indonesian,  Arabic,  Hungarian, 
Czech,  Georgian,  Armenian,  and  others. 

From  1957  to  1960,  quite  a  few  experimental  machine  trans- 
lations were  made  both  in  the  U.S.S.R.  and  abroad.  At  the 
Mathematics  Institute,  French-English  translation  experiments 
have  been  conducted  that  include  translations  of  selected  run- 
ning texts;  examples  of  phrases  translated  by  the  machine  ap- 
pear in  [7]  and  [23].  Recently,  English-Russian  translation 
experiments  have  been  begun  there,  too. 

Experimental  Russian-English  translations  have  been  made 
by  various  groups  in  the  U.S.A.  The  Harvard  and  Georgetown 
groups  and  that  at  The  RAND  Corporation  conduct  these  ex- 
periments more  or  less  regularly. 

MT  experiments  have  been  conducted  successfully  from 
French  to  English,  from  Russian  and  English  to  Chinese,  from 
English  to  Japanese,  and  from  English  to  Czech  in  England, 
China,  Japan,  and  Czechoslovakia. 

The  experience  accumulated  as  a  result  of  these  experiments 
has  permitted  the  serious  undertaking  of  mass  MT.  Further 
development  of  the  theory  of  MT  needed  here  will  lead  to  the 
presentation  of  new  and  interesting  problems  and  will  have 
considerable  influence  on  linguistics  as  a  whole.^* 


"  The  author  expresses  his  sincere  gratitude  to  V.  V.  Ivanov,  A.  A.  Reformat- 
skij,  O.  S.  Kulagina,  and  L.  N.  lordanskaya  for  their  valuable  notes  and  advice. 
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CHAPTER  V 


The  Application  of 
Statistical  Nlethods  in 
Linguistic  Research 


1.  Random  Events  and  Statistical  Regularities;  the  Concept 
of  Statistical  Rules 

In  studying  language,  we  constantly  encounter  situations  in 
which  various  language  phenomena  cannot  be  described  fully 
and  yet  briefly.  Language  is  a  system  composed  of  a  large  num- 
ber of  diverse  objects  interacting  according  to  very  complex 
laws.  The  functioning  of  linguistic  units  usually  depends  on 
so  many  factors  that  it  is  practically  impossible  to  take  them  all 
into  account  and  determine  the  outcome  of  their  interaction. 
For  this  reason,  it  is  only  comparatively  rarely  that  one  can  for- 
mulate strict,  fully  determined  rules  about  language  objects. 

By  rules  that  are  fully  determined,  we  mean  assertions  of  the 
following  type:  Upon  realization  of  a  fully  determined  com- 
plex of  conditions,  a  definite  event  must  take  place. 

For  example,  in  modern  Russian,  given  "a  voiced  consonant 
at  the  end  of  a  word  before  a  pause,"  the  consonant  must  be- 
come unvoiced.^  This  makes  it  possible  to  formulate  fully  de- 

^  [For  a  discussion  of  this,  see  the  Grammatika  russkogo  yazyka  [Grammar 
of  the  Russian  Language],  Vol.  I,  Izd-vo  AN  SSSR,  Moscow,  1960,  pp.  73-75.— 
Tr.] 
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termined  rules  about  a  voiced  consonant  at  the  end  of  a  word. 
Such  simple  cases  are  all  too  rare. 

Thus,  we  encounter  serious  difficulties  in  attempting  to  for- 
mulate equally  strict  rules  about  the  use  of  the  article  before 
a  noun  in  modern  English.  To  define  the  conditions  for  article 
choice  universally  and  unambiguously,  we  must  take  many  dif- 
ferent factors  into  account.  If  we  analyze  the  rules  given  in  or- 
dinary grammars,  we  are  soon  convinced  that  all  of  these  more 
or  less  brief  rules  do  not  allow  us  to  define  uniquely  the  condi- 
tions for  choosing  an  article.  If  we  assume  that  errors  occur 
when  the  rules  do  not  embrace  all  possible  conditions,  then  we 
can  increase  the  number  of  factors  to  be  accounted  for.  The 
number  of  errors  will  decrease,  but  since  so  many  factors  in- 
fluence the  choice  of  an  article,  and  since  their  interaction  is  so 
complex,  the  rules  will  become  more  and  more  cumbersome. 
Here  it  does  not  matter  how  much  we  complicate  the  rules;  we 
still  cannot  be  sure  that  they  will  handle  all  cases  correctly.  Ob- 
viously, too,  overcomplicated^  rules  are  of  little  use  either  in  the 
theoretical  description  of  language  or  in  a  practical  application. 

In  other  words,  we  cannot  make  a  sufficiently  exhaustive  list 
of  interacting  conditions  that  uniquely  determine  article 
choice,  and  so  we  must  continue  to  make  errors.  The  occurrence 
of  errors  when  our  rules  are  applied  indicates  that  after  ful- 
fillment of  the  complex  of  conditions  enumerated  in  the  rules, 
a  "substitution  of  a  certain  type  of  article"  may  or  may  not  take 
place.  Such  an  event  is  called  random  with  respect  to  that  com- 
plex of  conditions. 

However,  when  a  certain  event  A  is  random  with  respect  to 
a  given  complex  of  conditions  S,  this  does  not  mean  that  one 
cannot  establish  any  connection  between  A  and  5  in  general. 
Specifically,  even  with  the  simultaneous  occurrence  of  all  con- 
ditions in  set  5^  event  A  may  or  may  not  occur;  a  certain  regu- 
larity in  the  occurrence  of  A  is  observable,  given  a  high  occur- 
rence frequency  for  S;  event  A  has  a  definite  frequency.^ 

For  example,  in  applying  the  rules  of  article  placement  (pre- 


^  For  example,  the  rules  for  choice  of  an  article  occupy  50  printed  pages. 

'  By  the  frequency  of  A,  we  mean  the  ratio  of  the  number  of  occurrences  of 
event  A  to  the  total  number  of  times  A  might  have  occurred— i.e.,  to  the  num- 
ber of  times  the  conditions  S  occurred.  The  concept  of  frequency  will  be  dis- 
cussed in  greater  detail  below. 
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sented  in  an  authoritative  English  grammar)  in  order  to  trans- 
late a  certain  phrase  from  some  language  into  English,  we  could 
well  make  some  errors,  such  as  would  be  shown  by  the  nonoc- 
currence of  A  (the  choice  of  the  proper  article)  in  spite  of  the 
occurrence  of  S  (enumerated  in  the  rules  for  the  condition). 
However,  if  we  apply  these  rules  to  a  large  body  of  text,  we 
shall  determine  the  article  correctly  in  a  significant  number  of 
cases,  and  if  the  rules  have  been  formulated  particularly  well, 
such  cases  will  be  in  the  majority.  Moreover,  if  we  use  these 
rules  to  translate  several  different  texts  of  large  volume  con- 
taining an  approximately  equal  number  of  cases  in  which  arv 
tide  rules  must  be  applied,  we  shall  see  that  the  number  of  in- 
stances of  correct  article  determination  will  be  about  the  same 
for  each. 

This  means  that  although  our  rules  are  not  fully  determined, 
there  is  still  a  definite  connection  between  the  set  of  conditions 
enumerated  in  them  and  the  realization  of  "correct  article 
choice";  the  connection  is  expressed  quantitatively  by  the  fact 
that  correct  choices  are  made  with  a  definite  frequency  when 
the  conditions  occur  frequently. 

The  regularity  with  which  the  random  event  A  occurs  with 
a  definite  frequency,  given  frequent  occurrence  of  the  particu- 
lar set  of  conditions  5^  is  called  statistical.  Correspondingly,  we 
call  those  rules  statistical  that  contain  the  following  kind  of 
statement:  Given  frequent  occurrence  of  a  fully  determined  set 
of  conditions  S,  event  A  occurs  with  a  definite  frequency. 

In  other  words,  if  we  formulate  rules  about  articles  that  will 
determine  the  article  correctly  not  less  than  80  per  cent  of  the 
time,  then  we  can  call  such  rules  statistical.  We  can  show  that 
a  large  number  of  linguistic  situations  exist  that  can  be  des- 
cribed both  fully  and  briefly  only  by  means  of  statistical  rules. 

For  example,  in  order  to  avoid  using  the  same  noun  twice  in 
the  same  sentence  or  in  two  juxtaposed  sentences,  we  may  try 
substituting  a  third-person  personal  pronoun:  Stat' y a  "N" 
posvyashchena  analizu  dannogo  slovosochetaniya.  Ona  podrobno 
izlagaet  .  .  .  [Article  "N"  is  devoted  to  the  analysis  of  a  certain 
phrase.  It  (instead  of  stat'ya  —  article)  describes  in  detail  .  .  .]. 

There  are,  however,  a  significant  number  of  instances  in 
which  such  substitution  is  impossible,  since  various  grammati- 
cal and  syntactic  peculiarities  of  a  particular  sentence  cause 
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ambiguity  with  respect  to  the  antecedent  of  the  pronoun: 
Portret  napisan  izvestnym  khudozhnikom;  ya  nedavno  videl  ego 
[The  portrait  was  painted  by  a  famous  artist;  I  saw  it/him*  re- 
cently]. Or:  Sestra  vstupila  v  artisticheskuyu  gruppu;  ona  uekhala 
na  gastroli  [My  sister  joined  an  artistic  group;  she/it^  went  on 
the  stage]. 

Individual  instances  of  the  impossibility  of  substitution  have 
been  given  in  textbooks  on  literary  editing,  but  no  one  has  yet 
succeeded  in  formulating  strict,  entirely  specific  rules  covering 
all  cases.  This  is  apparently  due  to  the  fact  that  the  reasons  for 
this  impossibility— conditions  in  which  it  must  necessarily  be 
the  case  that  "substitution  is  impossible"  ("the  noun  must  be 
repeated")— are  usually  quite  specific,  and  frequently  depend 
on  the  individual  peculiarities  of  a  particular  phrase.  Any  at- 
tempt to  formulate  entirely  specific  rules  will  lead  to  the  neces- 
sity of  listing  all  of  these  peculiarities,  which  is  pointless.  L.  N. 
lordanskaya  (a  co-worker  in  the  Structural  and  Applied  Lin- 
guistics Section,  Institute  of  Linguistics,  Academy  of  Sciences, 
U.S.S.R.),  while  studying  this  question  in  connection  with  ma- 
chine translation,  proposed  statistical  rules  for  replacement  (or 
rather  for  determining  the  impossibility  of  substitution)  of  a 
noun  with  a  third-person  personal  pronoun.  Her  approach  is 
based  on  a  deliberate  refusal  to  enumerate  the  conditions  un- 
der which  "impossibility  of  substitution"  must  occur.  She  con- 
sidered a  large  number  of  sentences  in  which  corresponding 
conditions  occurred,  while  "impossibility  of  substitution"  may 
or  may  not  have  occurred.  She  then  separated  out  the  set  of  con- 
ditions that  caused  the  most  frequent  occurrence  of  "impossi- 
bility of  substitution."  She  succeeded  in  formulating  compact 
statistical  rules  that  pointed  to  the  following  result:  "With  fre- 
quent occurrence  of  some  set  of  conditions,  in  no  less  than  94 
per  cent  of  the  cases  did  'impossibility  of  substitution'  occur; 
i.e.,  the  noun  is  to  be  repeated." 

We  emphasize  the  fact  that  although  lordanskaya's  rules  are 
not  valid  in  every  case,  but  only  in  the  majority  of  cases,  this 


*  [Both  portret  and  khudozhnikom  are  masculine  singular,  while  ego  is  am- 
biguously animate  or  inanimate  and  could  refer  to  either  noun.— Tr.] 

^  [Both  sestra  and  gruppu  are  feminine  singular,  while  ona— ambiguously 
animate  or  inanimate— could  refer  to  either.— Tr.] 
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does  not  in  the  least  decrease  their  value,  because  the  attempt 
to  formulate  fully  specified  rules  for  this  purpose  (i.e.,  rules 
true  for  all  cases  without  exception)  has  failed.  Naturally,  it  is 
better  to  have  such  statistical  rules  as  those  already  described, 
true  no  less  than  94  per  cent  of  the  time,  than  to  have  no  rules 
at  all  or  to  have  ten  pages  of  rules  for  each  individual  object  of 
linguistic  study. 

The  study  of  random  events  and  of  the  statistical  regularities 
to  which  they  are  subject  is  the  province  of  special  mathemati- 
cal disciplines:  probability  theory  and  mathematical  statistics. 
Accordingly,  to  the  degree  that  certain  phenomena  in  language 
can  be  considered  as  random,  while  the  regularities  connecting 
them  with  a  definite  set  of  conditions  can  be  called  statistical, 
the  methods  of  probability  theory  and  mathematical  statistics 
must  be  applied  in  linguistics. 


2.  Method  for  Producing  Statistical  Rules;  Evaluation  of  the 
Reliability  of  the  Results  of  Observations 

Linguists  are  well  aware  that  a  significant  number  (perhaps 
even  the  majority)  of  linguistic  rules  formulated  as  if  they 
were  fully  specified  are  not  so  in  fact,  since  cases  are  constantly 
arising  in  which  an  event  does  not  occur  in  spite  of  the  occur- 
rence of  the  set  of  conditions  indicated  in  the  rules,  even  though 
it  supposedly  had  to  occur.  For  this  reason,  many  assertions  are 
made  in  linguistics  that  later  turn  out  to  be  sometimes  true, 
sometimes  false  in  analogous  conditions.  We  must  make  such 
assertions  more  accurate  by  indicating  how  frequently  they  are 
valid,  i.e.,  by  creating  statistical  rules.^ 

Since  a  statistical  regularity  connecting  a  set  of  conditions 
S  with  an  event  A  is  apparent  only  when  S  occurs  frequently,  it 
is  evident  that  to  formulate  statistical  rules  we  must  make  many 
observations  of  the  co-occurrence  frequency  of  A  with  5.  In 
mathematical  statistics,  there  are  definite  general  methods  by 
which  such  observations  are  to  be  made  and  frequencies  calcu- 
lated for  the  occurrence  of  random  events. 


'  See  [4],  pp.  112-130,  for  example. 
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In  general,  linguists  have  long  been  occupied  with  calcula- 
tions of  the  frequencies  of  various  kinds  of  events— the  appear- 
ance of  certain  phonemes,  words,  forms,  constructions,  etc. 
Here,  in  fact,  the  appearance  of  a  certain  form  or  word  from 
among  others  in  a  text  of  a  particular  period  or  author  should  be 
looked  upon  as  a  random  event,  while  the  set  of  conditions 
with  which  a  certain  event  is  connected  by  statistical  regularity 
extends  to  such  concepts  as  "Russian  texts  of  the  seventeenth 
century,"  etc.  Such  calculations  should  also  be  made  according 
to  the  rules  of  mathematical  statistics.  Yet,  in  the  overwhelming 
majority  of  studies,  these  rules  are  not  observed  and,  what  is 
especially  important,  the  results  obtained  are  not  accompanied 
by  evaluations  of  their  reliability. 

The  need  to  evaluate  reliability  is  dictated  by  the  following 
considerations.  A  linguist  attempting  to  make  some  assertion 
about  a  linguistic  fact  cannot,  as  a  rule,  track  down  all  the  pos- 
sible text  ("parole")  from  a  given  period  or  ethnic  group;  the 
researcher  is  confined  to  studying  a  specific  part  of  this  whole- 
some collection  of  texts  or  sound  recordings.  And  so  one  must 
judge  whether  an  event  occurs  frequently  (i.e.,  one  must  judge 
to  what  extent  the  occurrence  of  condition  S  brings  with  it  the 
occurrence  oi  A)  on  the  basis  of  a  limited  number  of  observa- 
tions. 

According  to  linguistic  practice,  the  quantity  of  material  be- 
ing studied  is  usually  limited  to  the  amount  required  by  the  re- 
searcher. Having  selected  the  quantity  of  text  that  he  can  han- 
dle, which  he  therefore  considers  sufficient,  and  having  written 
an  imposing  number  of  cards— hundreds  or  even  thousands— 
the  author  considers  his  material  complete  and  begins  to  ana- 
lyze and  distribute  it. 

The  results  of  this  type  of  research  usually  lead  to  an  asser- 
tion such  as:  "In  the  language  of  contemporary  Russian  litera- 
ture .  .  .  ,"  or  "In  sixteenth-century  English.  .  ,  ."  But  the  author 
will  have  studied  only  a  small  part  of  the  total  text  comprising 
"contemporary  Russian  literature."  Just  how  legitimate  is  such 
generalization  of  results,  obtained  from  doubly  limited  mate- 
rial, to  the  entire  "language  of  contemporary  Russian  litera- 
ture"? If  one  cannot  treat  all  relevant  text  but  only  some  part 
of  it,  one  must  attempt  to  draw  conclusions  about  the  whole 
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text  from  this  sample.  If,  however,  one  must  generalize  from  a 
part  to  the  whole,  then  one  ought  to  know  just  how  closely  the 
facts  pertaining  to  a  part  actually  correspond  to  what  takes 
place  in  the  whole.  Naturally,  absolute  correspondence  is  im- 
possible; the  degree  of  correspondence  can  vary  considerably, 
making  the  reliability  of  our  conclusions  about  the  totality  ex- 
tremely variable.  Hence,  the  intelligent  thing  is  to  demand 
that  the  reliability  of  conclusions  about  a  whole  text  general- 
ized from  a  sample  be  indicated  in  each  case.  For  example,  if 
on  the  basis  of  a  specific,  limited  number  of  observations  an  as- 
sertion is  made  that  for  every  100  times  certain  rules  were  ap- 
plied, 85  instances  of  correct  article  choice  occurred,  then  one 
must  still  indicate  how  reliable  his  assertion  is;  if  another  re- 
searcher takes  another  100  cases  (i.e.,  applies  the  rules  to  an- 
other text),  the  occurrence  of  "correct  article  choice"  may 
amount  not  to  85  times  but  to  60. 

According  to  mathematical  statistics,  the  fact  that  only  a  part 
of  a  whole— a  sample— is  available  for  immediate  observation 
does  not  prevent  us  from  making  quite  adequate  statements 
about  the  whole.  However,  this  is  possible  only  when  certain 
requirements  regarding  sampling  are  fulfilled. 

There  exist  in  mathematical  statistics  special  methods  for 
evaluating  the  reliability  of  results  and  for  determining  how 
large  a  sample  must  be  to  guarantee  a  certain  degree  of  relia- 
bility. We  shall  discuss  one  of  the  simpler  methods  later.  Be- 
cause of  space  limitations,  certain  concepts  of  mathematical  sta- 
tistics introduced  below  will  not  be  explained  in  detail;  we 
shall  only  refer  to  the  relevant  textbook  sources. 


5.  The  Concepts  of  Frequency,  Selection,  and  Relative  Error 

Let  us  take  the  following  problem.  If  in  a  contemporary  Rus- 
sian mathematics  text  two  nouns  occur  together  without  a  prep- 
osition, one  noun  governing  the  other,  what  will  be  the  case  of 
the  dependent  noun? 

We  know  from  the  rules  of  Russian  grammar  that  the  case 
of  the  dependent: 

(1)  must  not  be  nominative; 
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(2)  must  not  be  prepositional,  since  only  constructions  with- 

out prepositions  are  involved  here; 

(3)  must  not  be  accusative— such  constructions  have  not  been 

encountered  in  Russian  grammar; 

(4)  is  not  determined  by  the  case  of  the  main  noun. 

On  this  basis,  one  can  assert  that  if  there  is  a  construction  in- 
volving two  nouns  in  which  one  governs  the  other,  then  the  de- 
pendent noun  will  be  in  the  genitive,  the  dative,  or  the  instru- 
mental case.  This  assertion  does  not  satisfy  us,  however,  be- 
cause it  is  so  indefinite.  We  can  try  to  make  it  more  specific  by 
asserting  that  one  of  the  possible  cases  is  encountered  more 
frequently  than  the  others.  To  make  this  explicit  we  must  con- 
duct an  experiment  that  consists  of  making  observations  of  text. 
In  doing  so,  we  recognize  that  the  appearance  of  any  of  the 
three  enumerated  cases  is  an  accidental  event  with  respect  to 
the  set  of  conditions:  "a  construction  of  any  two  nouns  with- 
out a  preposition,  where  one  noun  governs  the  other."  Given 
frequent  occurrence  of  that  set  of  conditions,  it  is  necessary  to 
state  how  frequently  the  random  event  occurs:  "the  appearance 
of  a  particular  case."  So  there  are  three  possible  results: 

(1)  appearance  of  the  genitive  case  (gen.), 

(2)  appearance  of  the  dative  case  (dat.), 

(3)  appearance  of  the  instrumental  case  (instr.). 

We  shall  call  the  frequency  of  appearance  of  a  given  case  the 
ratio  of  the  number  of  times  that  case  appears  to  the  total  num- 
ber of  times  that  it  might  appear,  i.e.,  to  the  number  of  two- 
noun  constructions  without  prepositions.  Let  us  call  the  fre- 
quencies of  occurrence  of  the  corresponding  cases  Pgen,  Pdat,  and 
^instr-  Numerical  determination  of  these  frequencies  will  be 
the  solution  of  the  problem.  In  inspecting  text,  we  suppose  that 
these  frequencies  are  constant  for  all  relevant  texts,  i.e.,  that 
whatever  mathematics  text  we  select,  the  ratio  of  use  of  the 
genitive  and  other  cases  will  be  approximately  the  same.'' 

We  already  know  that  statistical  regularities  appear  only 
with  frequent  occurrence  of  the  conditions.  But  what  do  we 


'This  is  a  "statement  of  the  statistical  homogeneity  of  texts."  In  some  cases, 
it  is  only  true  in  the  first  approximation.  In  mathematical  statistics,  there  are 
methods  for  proving  the  reliability  of  this  statement  for  concrete  instances  [3]. 
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mean  by  "frequent"?  As  we  know,  the  number  of  observations 
is  always  limited  and  represents  only  a  sample  of  the  whole. 
What  kind  of  sampling  will  guarantee  us  the  "right"  to  base 
our  judgment  of  the  whole  on  it?  With  this  phrasing  of  the 
question  we  must  turn  our  attention  to  the  following  essential 
factors. 


3.1.  Size  of  Sample 

The  larger  the  sample  (the  more  text  is  sampled),  the  less 
chance  there  will  be  that  the  frequency  observed  in  it  differs 
significantly  from  the  frequencies  present  in  all  texts,  which  in 
our  example  comprise  the  whole  of  Russian  mathematics  litera- 
ture. If  in  determining  the  frequencies  of  cases  of  the  depend- 
ent noun  we  scan  a  sample  of  text  containing  about  100  cases  of 
two-noun,  prepositionless  constructions,  there  will  still  be  sam- 
ples of  text  that  happen  to  contain  no  instance  of  a  particular 
case,  as  well  as  samples  in  which  one  case  is  encountered  dis- 
proportionately often.  In  looking  over  larger  samples  of  text 
containing,  for  example,  about  500  noun  constructions,  we  will 
not  find  such  significant  "clumps"  and  "cutoffs,"  since  the  in- 
fluence of  accidental  factors  is  limited. 

Let  us  designate  the  size  of  the  sample  by  N,  the  frequency 
of  some  case  (for  example,  the  genitive)  observed  in  the  sam- 
ple by  P*gen,  the  frequency  of  the  same  case  as  occurring  in  the 
whole  text  by  Pgen-  Then  we  can  say  that  as  N  -^  00,  the  abso- 
lute value  of  the  difference  Pgen  —  P*gen  tends  toward  zero,  i.e., 

p      _  p      — >0 

I  -^  gen  ^  gen  |        '  v^« 

This  means  that  as  the  size  of  the  sample  increases,  the  value  of 
the  frequency  calculated  on  the  basis  of  the  sample  approaches 
the  value  of  the  frequency  existing  in  the  entire  body  of  text, 
i.e.,  the  difference  between  these  values  approaches  zero.  The 
difference  P  —  P*  is  called  the  absolute  error  of  measurement. 
In  order  to  characterize  the  degree  to  which  the  frequency 
determined  from  the  sample  approaches  the  frequency  in  the 
whole  text,  it  is  more  useful  to  consider  not  the  absolute  but 
the  relative  error,  which  is  the  ratio  of  this  difference  to  the 
measured  value  of  P,  i.e.,  (P  —  P*)  /P.  The  fact  is  that  the  ab- 
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solute  error  may  be  very  small  i£  the  measured  values  of  P  and 
P*  themselves  are  small,  but  this  difference  may  be  very  signi- 
ficant in  comparison  with  P  and  can  greatly  change  the  accuracy 
of  the  measurements.  For  example,  if  in  measuring  a  line  100 
mm  in  length  we  obtain  an  absolute  error  of  5  mm,  and  in 
measuring  a  line  10  mm  long  we  obtain  the  same  absolute  er- 
ror, then  in  the  first  case,  the  error  amounts  only  to  0.05,  while 
in  the  second,  it  amounts  to  0.5  of  the  measured  magnitude. 
Thus,  the  accuracy  of  measurement  can  only  be  represented  by 
the  value  of  the  relative  error. 

The  larger  the  sample,  the  more  accurate  the  calculation  ob- 
tained. But  there  arises  a  question  regarding  the  meaning  of 
the  expression  "accurate  calculation."  What  degree  of  accuracy 
will  suffice;  what  relative  error  can  be  allowed?  There  is  no 
general  answer  to  this  question.  The  limits  of  permissible  er- 
ror arise  from  practical  considerations.  It  seems  reasonable  to 
demand,  for  example,  that  even  in  the  roughest  estimates,  the 
relative  error.  B  should  be  less  than  30  per  cent  of  the  measured 
magnitude: 

\P  -  P*\ 

b  =  ■ <  0.3. 

P 

With  the  need  for  a  more  exact  evaluation,  the  value  of  I  can 
be  set  at  5  per  cent,  etc. 

3.2.  Frequencies 

Sometimes  it  is  necessary  to  determine  the  frequency  of  some 
phenomenon  with  a  specified  accuracy,  i.e.,  with  a  specified  rela- 
tive error. 

For  example:  In  order  to  determine  the  optimum  arrange- 
ment of  the  keyboard  of  a  specialized  typewriter  for  printing 
mathematical  text,  one  must  determine  the  frequency  of  the 
letter  F  in  mathematics  texts.^  We  know  that  the  letter  F  is 
quite  rare  in  literature,  but  we  can  show  that  in  mathematics 
it  is  more  frequently  encountered  because  of  the  occurrence  of 
such  words  as  fiinktsiya  [function],  faktorial  [factorial],  koef- 
fitsient  [coefficient],  and  differentsial  [differential].  In  a  prac- 


[That  is,  the  equivalent  Cyrillic  letter.— Tr.] 
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tical  problem,  S  can  be  set  at  10  per  cent  of  the  proposed  fre- 
quency of  the  F. 

According  to  some  preliminary  experimental  data,  the  letter 
F  occurs  about  twice  out  of  every  1,000  letters  of  Russian  math- 
ematics text.  In  studying  samples  of  text  about  1,000  letters 
long,  we  found  that  as  a  result  of  accidental  circumstances  F 
did  not  occur  at  all  in  the  first  sample,  while  it  was  encoun- 
tered four  times  in  the  second;  i.e.,  in  the  first  sample,  its  fre- 
quency was  seemingly  zero,  while  in  the  second,  it  was  0.004. 
The  frequency  observed  in  the  first  sample  was  essentially  dif- 
ferent from  that  in  the  second,  while  they  both  differed  from 
the  primary  data  (the  third  sample).  This  was  due  to  the  fact 
that  the  accidental  incidence  of  F  in  our  first  and  second  sam- 
ples completely  changed  the  result.  In  fact,  the  frequency  of 
this  letter  is  so  small  that  for  a  small  sample— 1,000  letters— ac- 
cidental factors  were  felt  very  strongly,  and,  therefore,  the  re- 
sults of  the  three  samples  differed  from  one  another  by  a  great 
deal  more  than  10  per  cent. 

If  we  increase  the  volume  of  the  sample  so  that  the  letter 
F  occurs  within  it  not  just  two  to  four  times  but  twenty,  then  the 
random  occurrence  of  one  or  two  extra  F's  in  the  sample  will 
not  change  the  result,  since  the  action  of  random  factors  is 
greatly  limited,^  and  the  relative  error  of  measurement  will  not 
go  beyond  the  specified  limits. 

Obviously,  in  order  to  ascertain  the  frequency  of  the  letter 
U  (about  two  and  one-half  times  as  frequent  as  F)  with  the 
same  relative  error,  we  will  find  a  considerably  smaller  sample 
satisfactory. 

It  follows  from  what  has  been  said  that  for  a  given  relative 
error  §,  the  smaller  the  frequency,  the  larger  the  sample 
needed: 

if|P  ->  0,        then  N-^  oo. 

The  interrelation  between  the  frequency  P,  the  sample  N, 
and  the  relative  error  §  can  be  illustrated  in  an  elementary  sta- 
tistical formula.  This  can  be  written  in  simplified  form  as 


ZpVi-P 
8  =  -     ,  (1) 

VNP 

'  [This  clause  (beginning  with  "since")  is  a  misrepresentation  of  probability 
theory.— Tr.] 
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where  S  is  the  relative  error,  N  is  the  sample  size,  P  is  the  fre- 
quency, and  Zp  is  the  constant  (found  in  a  special  table  usually 
presented  in  textbooks  on  probability  theory).  In  the  follow- 
ing discussion,  Zp  will  have  the  value  2.^° 

Using  this  formula,  we  can  solve  two  basic  problems  that 
arise  when  a  linguist  attempts  to  make  a  quantitative  evalua- 
tion. These  problems  are: 

(1)  Evaluation  of  the  reliability  of  results,  i.e.,  determina- 
tion of  the  relative  error  with  which  the  frequency  of  some 
phenomenon  is  calculated  from  a  particular  sample. 

(2)  Determination  of  the  sample  size  that  will  guarantee  the 
calculation  of  the  frequency  of  some  phenomenon  with  a  spe- 
cified relative  error. 

Let  us  consider  the  first  problem.  Its  "physical  meaning"  was 
presented  above,  and  we  already  know  that  given  a  large  sam- 
ple for  a  particular,  defined  frequency,  we  can  expect  better, 
more  reliable  results. 

For  example:  In  Spanish,  there  is  widespread  alternation  in 
verbal  roots:  About  forty  kinds  of  alternation  have  been  counted 
(e.g.,  sentir—siente,  sentir—sintio ,  saber— supo).  But  not  all 
forms  with  alternation  are  used  equally  often  in  texts.  To  per- 
fect a  methodology  for  teaching  Spanish  and  for  several  other 
purposes,  it  is  important  to  know  which  types  of  alternation 
are  basic  from  the  standpoint  of  the  frequency  with  which  they 
are  encountered  in  texts.  In  order  to  be  able  to  use  the  results, 
we  must  be  sure  of  their  reliability.  To  be  considered  reliable, 
conclusions  about  the  frequency  of  forms  with  a  certain  type  of 
alternation  must  be  obtained  with  a  relative  error  of  less  than 


"  Zp  =  2  corresponds  to  the  confidence  level  p  =  0.95.  Some  elucidation  of 
the  content  of  the  concept  of  "confidence  levels"  is  given  below.  For  more  de- 
tails on  this,  see  [3].  Here  and  in  what  follows,  frequencies  are  represented  as 
being  distributed  according  to  Gauss'  law  [the  "normal"  distribution— Tr.] 
(see  [3]  and  [29]). 

[We  have  corrected  formula  (1),  given  incorrectly  in  the  original  text.  Since  P 
is  a  number  that  cannot  be  known  but  only  estimated,  it  is  important  to  realize 
that  this  formula  is  approximately  valid  for  large  samples  when  the  estimate 
P*  is  used  instead  of  P.  The  proper  interpretation  of  the  formula  is  the  follow- 
ing: The  probability  is  p  (say,  0.95)  that  an  estimate  P*,  based  on  a  sample  of 
N  observations,  will  have  a  relative  error  (|P*  —  P|)/P  less  than  8  as  calculated  by 
formula  (1).  Cf.  Alexander  M.  Mood,  Introduction  to  the  Theory  of  Statistics, 
McGraw-Hill  Book  Co.,  Inc.,  New  York,  1950,  pp.  220-222  and  236.-Tr.] 
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10  per  cent.  Our  samples  will  obviously  be  all  forms  o£  irregu- 
lar verbs  encountered  in  text.  We  shall  then  determine  the  fre- 
quency of  each  type  of  alternation;  the  number  of  forms  with  a 
particular  type  of  alternation  will  be  compared  with  the  total 
number  of  forms  of  irregular  verbs  in  the  text  under  investiga- 
tion. For  the  sake  of  simplicity,  we  shall  confine  our  example 
to  four  types  of  alternation  that  are  usually  given  first  in  teach- 
ing irregular  verbs,  relegating  the  other  types  to  a  conditional 
fifth  type.  Having  determined  the  frequency  of  each  type,  we 
shall  calculate  from  formula  (1)  the  error  with  which  these 
frequencies  were  calculated.  The  results  of  the  observations  are 
shown  in  Table  1 . 

In  column  3  of  Table  1 ,  the  frequency  of  each  type  is  written 


TABLE  1 


1 

2 

3 

4 

Alternation-type 
number 

Characteristics  of 
alternation 

Frequency  of 
alternation  type 

Relative 
error 

I 

II 

III 

IV 

V 

e/ie 
sentir — siente 

o/ue 

morir — muere 

eli 
pedir — pido 

no  letter /j 
construir — construye 

All  others 

655 

0.38 

1721 

495 

-^0.29 

1721 

83 

^0.05 

1721 

87 

0.05 

1721 

401 

0.23 

1721 

±7.8% 

±9.04% 

±22% 

±21% 

±10% 

The  total  number  of  irregular-verb  forms  is  1721. 
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as  a  proper  fraction,  namely,  the  ratio  of  the  number  of  occur- 
rences of  the  "appearance  of  a  certain  type  of  alternation"  (i.e., 
good  results)  to  the  number  of  possible  occurrences  (i.e.,  to 
the  number  of  forms  with  alternations  in  the  text  being  stud- 
ied): Type  I  was  encountered  655  times,  11—495  times,  HI- 
SS times,  IV— 87  times,  and  V  (conditional)— 401  times.  In  all, 
655  +  495  +  83  +  87  +  401  =  1,721  forms  with  alternations  oc- 
curred in  the  text  (1,721  possible  occurrences).  For  clarity,  the 
frequency  of  each  type  is  shown  in  decimals,  as  well.  For  exam- 
ple, the  frequency  of  type  I  is  0.38.  This  means  that  for  each 
100  forms  with  alternation,  about  38  forms  had  type  I  alterna- 
tion. For  reasons  already  stated,  the  sample-determined  fre- 
quency is  only  approximate.  The  relative  error,  in  column  4, 
shows  the Jimits  of  this  approximation,  to  wit:  If  the  frequency 
0.38  is  determined  with  a  relative  error  of  ±  7.8  per  cent,  this 
means  that  in  scanning  any  Spanish  text,  we  shall  find,  for  every 
100  forms  of  verbs  with  alternation,  from  [38  -  .078  (38)  ]  to 
[38  +  .078  (38)  ]  forms  of  type  I.  Since  .078  (38)  is  approxi- 
mately equal  to  3,^^  this  means  that  the  number  of  type  I  forms 
will  oscillate  from  35  to  41  for  every  100  occurrences  of  forms 
with  alternation.^- 

According  to  mathematical  statistics,  there  can  also  be  in- 
stances in  which  for  every  100  forms  with  alternation,  type  I 
occurs  less  than  35  times  (we  shall  call  this  number  the  lower 
limit  of  frequency)  or  more  than  41  times  (the  upper  limit); 
however,  such  large  deviations  will  be  encountered  only  very 
rarely. 

The  constant  Zp  in  the  formula  for  determining  the  relative 
error  defines  the  number  of  occurrences  of  such  large  devia- 
tions which  go  beyond  the  lower  and  upper  limits.  In  particu- 
lar, the  value  we  chose  for  the  constant,  Zp  =  2,  corresponds  to 
p  =  0.95,  which  means  that  in  95  out  of  100  samples,  type  I  fre- 
quency will  oscillate  (for  each  100  verbal  forms  with  alterna- 
tion) within  the  indicated  bounds,  which  it  can  exceed  in  only 
5  samples.^^ 


"  The  number  of  occurrences  is  not  expressed  as  a  fraction. 

"  [Or  rather,  from  35  per  cent  to  41  per  cent  in  samples  of  1,721  occurrences 
of  forms  with  alternation.— Tr.] 

^*  [That  is,  in  95  out  of  100  samples  of  1,721  occurrences.— Tr.] 
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Comparing  the  values  of  relative  error  written  in  column  4, 
we  note  that  not  all  the  results  are  equally  precise.  For  exam- 
ple, the  accuracy  with  which  the  type  III  frequency  was  deter- 
mined is  clearly  not  sufficient  according  to  the  accepted  crite- 
rion. This  happens  in  a  sample  of  fixed  size  because  a  smaller 
frequency  (and  type  III,  obviously,  occurs  more  rarely)  will 
cause  a  greater  relative  error.  Consequently,  if  one  is  to  obtain 
results  for  type  III  as  accurate  as  those  for  type  I,  one  must  en- 
large the  sample. 

We  come,  thus,  to  the  second  problem:  determination  of  the 
sample  size  that  will  guarantee  a  specific  accuracy.  According 
to  the  accepted  criterion,  the  relative  error  may  not  exceed  0.1. 
The  frequency  of  type  III  is  P  =  83/1,721  ^0.05.  It  has  a  rela- 
tive error  of  ±  22  per  cent.  We  have  thus  found  that  for  each 
1,000  forms  with  alternation,  there  will  occur  approximately 
38  to  58  forms  of  type  III.  In  producing  the  sample,  we  cannot 
know  beforehand  how  many  times  the  form  being  sought  oc- 
curs in  it,  but  we  can  foresee  that  the  required  accuracy  of  10 
per  cent  will  be  guaranteed  even  if  a  minimal  number  of  type 
III  forms,  corresponding  to  the  lower  limit,  is  found  in  a  given 
concrete  sample.  Thus,  in  determining  the  size  of  the  sample, 
we  will  exceed  the  lower  bound  of  the  frequency  of  type  III; 
according  to  (1): 

2 

0.1  =  —==  ,        whence  N  =  10,500; 
VO.OSSA^ 

i.e.,  in  order  to  define  the  frequency  of  type  III  with  an  accu- 
racy of  not  less  than  ±10  per  cent,  one  must  scan  a  text  con- 
taining not  less  than  10,500  forms  with  alternations. 

It  is  exactly  the  same  in  determining  the  frequency  of  the  let- 
ter F  (see  pp.  89-90)  with  the  same  relative  error  as  for  U;  one 
must  take  a  much  larger  sample  for  F  than  for  f/." 

In  connection  with  what  has  already  been  said,  we  must  now 
turn  our  attention  to  one  mistake  commonly  made  by  linguists 
in  formulating  their  problems.  The  question  is  usually  posed: 
We  want  to  find  the  frequency  of  a  certain  form  (phoneme, 
word,  etc.);  how  do  we  calculate  the  size  of  the  sample  needed? 


"  [The  Cyrillic  letter  transliterated  by  U  is  common;  that  transliterated  by  F 
is  rare.— Tr.] 
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It  was  emphasized  earlier  that  the  concept  "necessary  size  of  the 
sample"  has  no  meaning  if  one  has  not  defined  the  degree  of 
accuracy  to  be  guaranteed.  But  in  order  to  use  formula  (1)  one 
must  know  with  what  approximate  frequency  to  begin  one's 
calculations;  otherwise,  there  will  be  two  unknowns  in  the  for- 
mula. This  also  has  an  inclusive  aspect:  Given  an  equal  speci- 
fied error,  in  order  to  determine  the  least  frequency,  one  must 
take  a  larger  sample.  But  it  is  this  very  determination  of  the 
frequency  that  is  the  goal  of  the  study.  The  way  out  of  this  dif- 
ficulty is  through  a  preliminary  experiment.^^ 

The  preliminary  experiment  can  be  conducted  as  follows. 
From  a  small  sample,  one  determines  the  frequency  of  the  event 
being  studied,  with  a  certain  relative  error.  Then,  exceeding 
the  lower  limit  of  frequency  (pp.  93-94)  and  of  the  relative 
error  permissible  for  final  results,  one  calculates  the  necessary 
size  of  the  sample. 

Take,  for  example,  the  determination  of  the  frequency  of 
type  III  alternation  in  the  root  of  the  Spanish  verb.  We  can 
consider  the  calculation  in  Table  1  for  type  III  to  be  the  pre- 
liminary experiment;  initially,  in  a  sample  containing  1,721 
forms  with  alternation,  the  frequency  of  type  III,  the  error,  and 
the  lower  limit  of  frequency  were  established.  Then,  from  the 
lower  limit  of  frequency  and  the  required  accuracy,  the  neces- 
sary size  of  the  sample  was  found  to  be  10,500. 

We  have  discussed  here  some  of  the  simplest  methods  of 
treating  the  results  of  observations  with  regard  to  evaluating 
their  reliability. 

Linguistic  studies  have  appeared  recently  that  apply  methods 
developed  in  mathematical  statistics  for  the  design  of  experi- 
ments and  evaluation  of  reliability.  In  this  connection.  Yule's 
book.  The  Statistical  Study  of  Literary  Vocabulary  [48],  B.  Ep- 
stein's introductory  article  in  Josselson's  frequency  glossary 
[29],  articles  by  American  psychologists  in  the  anthology  Stud- 
ies in  Language  Behavior  [39],  and  several  other  papers— [4], 
[34],  and  [35]— are  all  very  interesting. 


^  In  some  cases,  the  numerical  bounds  of  the  frequencies  we  want  to  obtain 
in  the  course  of  the  study  can  be  evaluated  on  the  basis  of  data  already  existing 
in  the  literature. 
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4.  A  Statistical  Approach  to  the  Description  of  Individual 
Linguistic  Phenomena 

A  statistical  approach  to  the  description  of  individual  linguistic 
phenomena  is  not  something  new.  The  first  statistical  studies  of 
lexicology  appeared  at  the  end  of  the  last  century.  From  then 
on,  a  large  volume  of  literature  has  accumulated,  primarily  on 
the  statistics  of  words  and  sounds  (or  phonemes)— [4],  [13], 
[18],  [26],  [40],  [41],  [44],  etc. 

Certain  of  the  methods  used  in  these  studies  excite  one's  at- 
tention, but  the  most  interesting  studies  generally  contain  a 
complex  mathematical  apparatus,  which  is  not  available  to  lin- 
guists. Therefore,  it  seems  useful  in  the  present  short  work  to 
turn  our  attention  to  the  essence  of  the  statistical  approach  in 
the  studies  mentioned,  at  the  same  time  reducing  the  mathe- 
matical apparatus  to  a  minimum. 

We  shall  pay  particular  attention  to  (1)  attempts  to  describe 
lexicology  statistically;  (2)  studies  of  the  composition  of  words, 
with  respect  to  the  number  of  syllables;  (3)  research  using  sta- 
tistical methods  to  study  the  rhythmic  structure  of  poetry. 

4.1.  Statistical  Methods  in  Lexicological  Research^*' 

The  first  attempts  to  apply  statistical  methods  in  describing 
the  facts  of  language  are  connected  with  the  compilation  of  so- 
called  "frequency  dictionaries"  (the  first  we  know  of  is  Kaed- 
ing's  dictionary  [30],  published  in  1898). 

A  frequency  dictionary  is  a  list  of  words  (a  vocabulary)  in 
which  every  word  carries  an  indication  of  its  occurrence  fre- 
quency in  a  text  of  a  certain  length.  For  example,  Kaeding  stud- 
ied texts  with  a  total  length  of  1 1  million  words,  and  then  vo- 
cabularized  them  and  counted  the  frequency  of  each  word 
throughout  the  11  million  running  words.  Frequency  diction- 
aries permit  one  to  compare  words  from  the  standpoint  of  their 
usage  frequency;  hence,  their  sphere  of  application  is  rather 


"  Here  and  in  what  follows,  "lexicology"  and  "dictionary"  refer  to  the  dic- 
tionary form  of  a  text,  a  vocabulary.  The  lexical  structure  of  an  individual  word 
and  the  system  of  its  meanings  are  not  considered. 
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large.  The  data  of  frequency  dictionaries  are  of  great  theoreti- 
cal interest  for  studies  of  certain  properties  of  text  with  regard 
to  its  relation  to  the  vocabulary  and  to  the  frequencies  of  words 
composing  a  given  text. 

Let  us  go  into  this  a  little  further. 

(a)   The  statistical  structure  of  text. 

Suppose  we  have  a  certain  text  N  words  in  length  and  its  list 
of  (different)  words  L  in  which  N  >  I  and  L  >  1.  Detaching 
ourselves  from  the  properties  of  real  text,  we  can  imagine  that 
the  following  relations  exist  between  wordlist  and  text: 

(1)  N  =  L  (all  the  words  in  the  text  are  different); 

(2)  A/"  >  L  (some  of  the  words  are  repeated). 

Most  real  text  corresponds  to  the  second  type.  If  we  fix  the 
value  of  N  (say,  N  =  10,000),  then  we  cannot  say  anything 
about  L  beforehand,  other  than  the  fact  that  it  is  less  than 
10,000.^^  If  we  fix  the  value  of  N  and  the  value  of  L— say,  N  = 
10,000  and  L  =  2,000— then  several  quite  distinct  situations  are 
theoretically  possible.  For  example: 

Text  1.  1,500  words  in  wordlist  L  occur  once,  accounting  for 
1,500  occurrences,  while  the  other  8,500  words  of  the  text  con- 
sist of  repetitions  of  500  words  in  the  list. 

Text  2.  50  words  from  list  L  occur  once,  and  9,950  text  words 
are  repetitions  of  1,950  words  in  the  list,  etc. 

This  means  that  the  "structure"  of  the  text,  from  the  stand- 
point of  word-repetition,  is  still  not  defined  either  by  the  length 
of  the  text  or  by  the  size  of  the  wordlist  but,  rather,  by  the  num- 
ber of  individual  groups  of  words  repeated  a  specific  number  of 
times.  In  order  to  describe  the  structure  of  text  1,  it  is  clearly 
essential  to  show  not  only  that  there  are  words  in  it  occurring 
only  once,  because  there  are  also  such  words  in  text  2,  but  also 
that  the  number  of  words  in  the  group  with  text-frequency  1  is 
1,500,  while  this  group  in  text  2  is  thirty  times  smaller. 

Suppose  we  describe  the  "structure"  of  the  text  from  the 
standpoint  of  word  repetition,  showing  how  many  words  have 
frequency  1,  2,  etc.  We  are  not  dealing  here  with  individual 


"  We  shall  deal  later  with  the  question  of  how  the  size  of  the  wordlist  L  is 
related  to  the  length  of  text  N. 


98    Application  of  Statistical  Methods  in  Linguistic  Research 

properties  of  words/®  and  we  extract  only  that  property  which 
is  important  for  characterizing  text  from  a  particular  standpoint. 

Since  we  know  that  text  1  contains  1,500  words  with  a  fre- 
quency of  1,  and  text  2  has  only  50  such  words,  we  can  assert 
that  the  chance  of  randomly  selecting  from  the  text  one  word 
with  a  frequency  of  1  (or,  to  put  it  another  way,  the  probability 
of  a  word  with  frequency  1)  is  thirty  times  greater  for  text  1 
than  it  is  for  text  2. 

We  shall  call  the  "structure"  of  the  text,  in  the  sense  indicated 
above,  its  "statistical  structure."  We  shall  consider  that  the  sta- 
tistical structure  of  the  text  is  known  if,  for  any  possible  fre- 
quency of  a  word,  the  probability  of  randomly  selecting  from 
text  a  word  with  the  given  frequency  is  known. 

The  "possible"  frequency  (i.e.,  the  scale  of  values  for  the  fre- 
quency of  a  word)  is  defined  by  the  fact  that  a  word  cannot 
have  frequency  0  (we  are  considering  only  words  actually  en- 
countered in  a  given  text);  and  since  L  >  I  and  L  <  N  (see 
p.  97),  the  word  cannot  have  the  frequency  N^  for  this  would 
mean  that  the  text  consisted  of  repetitions  of  one  single  word. 
Thus,  a  word's  frequency  (we  shall  call  it  ^)  can  take  any  in- 
teger value  within  the  interval  I  <  ^  <  N. 

The  chance  of  randomly  selecting  from  the  text^^  a  word  hav- 
ing a  particular  frequency  depends  on  what  proportion  of  words 
with  a  given  frequency  exists  among  the  other  text  words.  To 
find  the  probability  of  randomly  selecting  from  text^"  a  word 
with  ^  =  1,  one  must  divide  the  number  of  words  with  i  =  1 
(there  are  1,500  of  them)  by  the  text  length^^  (10,000).  We  write 
this  as  follows: 

,  ,         1,500 

P{^  =  1     = =  0.15; 

^        10,000 

"We  are  assuming  that  we  know  for  sure  what  words  to  call  individual,  and 
which  to  consider  different. 

"  [The  author  should  have  said  "from  the  wordlist."— Tr.] 

^''  [From  the  wordlist.— Tr.] 

"  [By  the  length  of  the  wordlist,  which  in  this  case  is  L  =:  2,000.  To  estimate 
the  probability  of  selecting  from  text  a  word  that  occurs  ^  times,  multiply  §  by 
the  number  of  words  in  the  wordlist  that  occur  |  times  (say,  L^)  and  divide  by 
N:  P{|  =  1}  =  ^L^/N.  In  the  example,  |  =  1  and  L^=  1,500,  hence  P{1  =  1}  = 
(1)(I,500)/10,000  =  0.15.  But  take  |  =  2  and  L^  —  500;  then  P{|  =  2}  =  (2)(5O0)/ 
10,000  =  0.10,  and  the  formula  in  the  text  is  incorrect.— Tr.] 
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We  write  the  probability  of  selection  of  words  with  frequen- 
cies 2,  3,  etc.,  in  the  same  way.  After  we  have  calculated  all  of 
these  probabilities,  we  can  say  that  we  know  the  statistical  struc- 
ture of  the  text,  or,  to  put  it  another  way,  the  distribution  of 
probabilities  with  which  the  random  variable  "frequency  of  a 
word"  takes  some  value  or  another. 

If  we  pursue  the  procedure  described  here,  using  the  infor- 
mation about  word  frequency  presented  in  frequency  diction- 
aries, and  compare  the  statistical  structures  (distributions  of 
probabilities)  obtained  for  various  languages,  then  it  turns  out 
that  they  are  extremely  similar.  The  general  aspect  of  the  prob- 
ability distribution  obtained  can  be  expressed  graphically  as 
follows:  We  take  a  system  of  rectilinear  coordinates,  and  plot 
on  the  X-axis  the  scale  of  values  for  the  frequency  ^  of  a  word. 
On  the  y-a.xis,  we  plot  the  probability  P  with  which  a  word's 
frequency  takes  a  certain  value.  (Working  with  one  text,  we 
need  not  always  divide  the  number  of  words  with  a  particular 
frequency  by  the  length  of  the  text,^-  but  can  merely  plot  the 
number  of  words  with  a  given  frequency  directly  along  the  y- 
axis.)  Joining  these  points,  we  obtain  a  curve  having  approxi- 
mately the  form  presented  in  Figure  3.^^ 

Experiments  show  that  such  graphs,  drawn  from  material  on 
any  language,  have  common  features:^* 

1.  The  curve  showing  word  frequency  as  a  function  of  occur- 
rence frequency  has  one  peak  (the  maximum  point),  located 
at  ^  =  1.  This  obviously  means  that  in  any  sufficiently  large 
text  there  are  more  words  having  frequency  1  than  words  hav- 
ing frequencies  of  2  or  20. 

2.  After  ^  =  1,  a  sharp  decrease  in  probability  occurs,  caus- 
ing the  highest  possible  value  for  B,  to  correspond  usually  to  y  — 
1;  this  means  that  in  any  one  text  there  is  only  one  most  fre- 
quent word.  We  can  conclude  from  this  that  not  every  theoreti- 
cally constructed  statistical  structure  for  text  (see  p.  97)  corre- 
sponds to  real  texts.  For  example,  no  texts  have  been  observed 


^-  [By  the  length  of  the  wordlist.— Tr.] 

^  Such  a  graph  is  sometimes  called  a  Yule  distribution  [48]. 

-*We  note  with  interest  that  if  we  do  not  study  all  the  words  in  a  text,  but 
only  nouns  or  only  verbs,  the  graph  (or  a  table  corresponding  to  it)  has  gen- 
erally the  same  character. 
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up  to  now  for  which  the  probability  distribution  curve  has  more 
than  one  peak,  or  a  peak  such  that  i  >  1,  etc. 

We  can  assume  that  since  the  general  features  described  are 
inherent  in  the  statistical  structures  of  various  texts  in  various 
languages,  they  are  connected  with  some  peculiarities  of  the 
functioning  of  language  as  such,  and  are  not  connected  with  the 
"language  of  Pushkin"  as  opposed  to  the  "language  of  Tolstoy," 
or  with  the  English  language  as  opposed  to  French. 

A  detailed  study  of  the  statistical  structures  of  various  texts 
reveals  that,  aside  from  the  general,  similar  features  already 


Figure  3.     The  Probability  P  with  Which  a  Word's  Frequency  Takes  a 
Certain  Value  as  a  Function  of  the  Frequency  ^. 


described,  there  are  also  essential  differences  among  texts  from 
the  standpoint  of  frequency  of  word  distribution.  This  becomes 
evident,  for  example,  when  one  compares  the  statistical  struc- 
tures of  the  texts  of  various  authors  writing  in  one  language 
(Figure  4). 

Curves  I  and  II  (Figure  4),  which  describe  word  distri- 
bution for  the  works  of  various  authors,  have  the  general  fea- 
tures shown  in  Figure  3  but,  at  the  same  time,  they  are  quite 
different  from  each  other;  e.g.,  in  text  I,  there  are  more  low- 
frequency  words  (1,  2,  3)  than  in  text  II,  and  fewer  words  of 
very  high  frequency,  i.e.,  there  are  more  different  words  and 
fewer  repetitions. 

In  this  sense,  one  can  say  that  the  wordlist  for  text  I  is  more 
varied  than  that  for  text  II,  or  that  for  someone  studying  this 
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foreign  language,  it  will  be  easier  to  read  II,  etc.;  in  general, 
one  can  make  several  comprehensive  judgments  about  each  of 
these  texts,  basing  them  on  the  nature  of  their  statistical  struc- 
ture. 

We  note  that  in  linguistics  one  often  speaks  of  the  "wealth" 
of  a  vocabulary,  or  of  its  "homogeneity,"  about  a  similarity  or 
a  difference  in  the  styles  of  various  texts  or  authors,  basing  such 
remarks  primarily  on  intuitive  feelings,  and  sometimes  accom- 
panying them  with  statements  that  certain  words  or  forms  are 
encountered  "more  often"  or  "less  frequently"  in  author  ^'s 


Figure  4.  Word  Distributions  for  Works  of  Different  Writers.  (In 
text  I,  there  are  more  low-frequency  words  and  fewer  high-frequency 
words  than  in  text  II.) 


works  than  in  those  of  author  B.  Comparative  study  of  statistical 
structures  sharpens  such  subjective  remarks  to  greater  accuracy, 
and  allows  one  to  introduce  exact  measurements  for  such  char- 
acteristics as  wealth  of  vocabulary,  and  the  similarity  or  differ- 
ence between  texts  from  the  standpoint  of  the  use  of  specific 
word  classes,  such  as  archaic,  neutral,  stylistically  colored,  or 
dialect  words. 

Since  the  concept  of  "style"  presupposes  the  presence  of  sev- 
eral properties  inherent  in  a  given  text  (or  texts)  or  author,  as 
opposed  to  others— i.e.,  because  it  is  based  on  a  comparison  and 
presentation  of  the  similar  and  the  different— it  is  reasonable 
to  propose  that  style  is  the  sum  of  statistical  characteristics  de- 
scribing the  content  properties  of  a  particular  text  as  distinct 
from  others. 
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For  example,  for  describing  the  characteristic  of  vocabulary 
"wealth,"  we  naturally  choose  characteristics  uniquely  defining 
curve  I  as  differing  from  curve  II,  or  defining  the  form  of  II  as 
opposed  to  any  other,  etc.  Such  a  method  was  applied  success- 
fully in  the  book  of  the  famous  English  statistician  G.  U.  Yule, 
The  Statistical  Study  of  Literary  Vocabulary  [48],  in  which  one 
may  familiarize  himself  completely  with  the  essential  numerical 
characteristics  already  discussed. 

It  is  especially  interesting  to  compare  the  statistical  structures 
of  texts  in  determining  the  authorship  of  anonymous  works 
(provided  we  are  already  familiar  with  a  certain  amount  of  text 
known  to  be  from  the  pen  of  the  author  supposed  to  have  writ- 
ten the  anonymous  material). 

The  methodology  of  a  detailed  study  has  the  following  gen- 
eral features: 

(1)  A  graph  (or  table)  showing  the  word  distribution  of  the 
anonymous  work  is  drawn  (Yule  studied  only  noun  distribu- 
tion; it  might  have  been  more  effective  to  tabulate  other  un- 
ambiguous parts  of  speech  as  well). 

(2)  Analogous  tables  are  made  for  texts  by  two  (or  more) 
authors  supposed  to  have  written  the  anonymous  text. 

(3)  For  each  table  (anonymous  text,  author  I,  author  II, 
etc.),  numerical  characteristics  are  collected,  describing  in  suf- 
ficient detail  the  differences  among  the  distributions  in  the  dif- 
ferent texts.  Then  these  numerical  characteristics  are  com- 
pared, and  if  those  of  the  anonymous  text  more  closely  resem- 
ble those  of  author  II  than  of  I,  we  have  reason  to  conclude 
that  the  anonymous  text  is  by  author  II. 

(b)  The  relation  between  text  length  and  the  size  of  the 
wordlist. 

One  aspect  of  the  relationship  between  texts  and  wordlists  is 
the  relation  between  text  length  (measured  in  words^^)  and 
length  of  wordlist.  We  have  already  observed  that  even  if  we 
know  the  text  length,  N,  we  still  cannot  say  anything  about  the 
size  of  the  wordlist,  L. 

Of  course,  one  can  be  sure  a  priori  that  in  a  novel  200  pages 


^'  The  formulation  of  the  question  does  not  depend  on  the  definition  of 
"word";  text  length  could  even  be  measured  by  the  number  of  printed  charac- 
ters. 
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long  there  are  more  different  words  than  in  a  three-page  story, 
but  experiment  has  shown  that  100  pages  of  the  novel  contain 
more  than  half  as  many  different  words  as  the  whole  novel.  On 
the  other  hand,  equally  long  works  by  different  authors  often 
differ  greatly  in  wordlist  length;  sometimes  a  shorter  text  by 
one  author  has  a  broader  vocabulary  than  a  longer  one  by  an- 
other. 

Apparently,  one  cannot  obtain  through  logical  discussion  an 
answer  to  questions  regarding  the  general  character  of  the  con- 
nection bjetween  L  and  N.  However,  for  any  concrete  text,  one 
can  construct  a  graph  (see  Figure  5)  of  the  function  L  =  F  (N). 
(This   is  read   "L   is   a   certain   function   of  N.")   Here,   the 


Figure  5.     Length  L  of  Wordlist  as  Function  of  Length  N  of  Text. 

length  of  a  text  in  words  is  plotted  along  the  x-axis,  and  the 
number  of  different  words  used  in  it  is  plotted  along  y.  We 
shall  then  obtain  a  dotted  line  joining  the  plotted  points  that 
closely  approximates  the  curve  shown  in  the  graph.  The  nature 
of  the  connection  between  L  and  N  was  studied  by  Chotlos  in 
school  publications  of  American  youth  [39],  by  W.  Kuraszkiewicz 
in  Polish  literary  texts  of  the  16th  century  [31],  and  by  Garcia 
Hoz  in  various  texts  of  Spanish  literary  and  commercial  prose 
[22]. 

Experiments  have  shown  that  the  graphs  for  L  =  F  (N)  have 
several  features  in  common,  apparently  not  dependent  upon  the 
author's  language  or  the  character  of  the  text.  With  increase  in 
text  length,  the  rate  of  increase  of  L  decreases;  the  curve  al- 
ways has  a  parabolic  form,  convex  above.  On  the  basis  of  these 
observations,  attempts  have  been  made  to  obtain  an  empirical 
formula  for  L  =  F  (N)  that  is  generally  applicable  to  all  texts. 
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Carroll  proposed  an  empirical  formula  having  the  form: 

N 
L  =  —  (0.423  +  fc  -  In  iV  +  In  /c), 
k 

where  L  is  the  number  of  different  words,  N  is  the  text-length, 
and  k  is  an  empirical  constant.^^ 

If  Carroll's  formula  were  true  for  every  text  or  for  some  par- 
ticular group  of  texts,  then  for  any  text  once  one  had  found  the 
value  of  k,  one  could  always  determine  L,  given  N,  and  con- 
versely. 

But  Chotlos  (in  [39] )  has  shown  that  this  formula  is  errone- 
ous. He  proposed  another  empirical  formula: 

N(a- In  N) 

L  = ) 

b 

where  a  and  b  are  empirical  constants.  In  Chotlos'  experiment, 
where  a  =  11,670  and  b  =  11,268,  this  formula  describes  the 
curve  for  increase  in  new  words  for  N  <  18,000  nicely;  but  the 
proof  for  this  formula  shows  that  when  N  >  18,000,  it  is  unsat- 
isfactory. 

The  attempts  of  Kuraszkiewicz  to  obtain  a  function  for  L 
=  F(N)  were  no  more  successful  for  comparing  the  vocabu- 
laries of  various  authors  with  respect  to  text-length.  Thus,  al- 
though this  function  can  be  given  for  each  concrete  text  in  a 
graph  or  table,  an  analytic  description  has  not  yet  been  found 
even  for  concrete  texts.  Moreover,  we  do  not  know  whether  a 
general  analytic  function  exists  for  expressing  the  connection 
between  wordlist  and  text. 

(c)  Determination  of  the  connection  between  a  word's  fre- 
quency and  its  rank  by  decreasing  frequency  ("Zipf's  law"). 

In  linguistic  literature  much  attention  has  been  devoted  to 
the  study  of  a  relation  known  as  "Zipf's  law"  (  [15],  [25],  [33], 
[42],  [49] ).  The  "law"  itself,  formulated  before  Zipf  by  J.  B. 
Estoup  [19]  in  1916  and  by  E.  U.  Condon  [17]  in  1928,  is  as 
follows.  Let  us  imagine  a  text  N  words  long  to  which  is  attached 
a  wordlist  L  words  long,  with  an  indication  on  every  word  of 
text-occurrence  frequency.  The  words  in  the  list  are  distributed 


'  Expanded  in  Chotlos'  paper  in  [39]. 
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in  order  of  decreasing  frequency  and  are  renumbered  from  1 
(die  number  of  the  most  frequent  word)  to  L.  We  shall  desig- 
nate a  word's  frequency  by  Pi,  and  its  rank  by  ri,  where  i  can 
assume  any  integral  value  in  the  interval  1  <  ^  <  L.  The 
text's  wordlist  has  the  form: 


Ti 

Pi 

1 

Pi 

2 

P2 

r 

Pr 

L 

Pl 

Let  us  imagine  that  each  of  the  N  words  in  the  text  is  marked 
with  a  number  r^  corresponding  to  the  location  of  the  word  in 
this  wordlist.  Then  the  most  frequent  (with  frequency  Pi)  in 
the  text  will  be  found  first  [on  the  list— Tr.],  a  somewhat  less 
frequent  one  (with  frequency  Pa)  will  be  second,  etc,  Zipf's  law, 
in  defining  the  connection  between  a  word's  frequency  and  its 
rank  in  a  list  ordered  on  decreasing  frequency,  allows  us  to  ap- 
proximate the  proportion  of  words  with  a  given  rank,  or— to 
put  it  somewhat  differently— to  approximate  the  probability  of 
a  randomly  selected  word's  having  a  certain  rank. 

We  can  write  the  formula  for  Zipf's  law^^  in  the  following  gen- 
eral form: 

P[r  =  i]  =  kr-y, 

where  r  is  the  rank  of  a  word;  P  is  the  probability  that  this 
word  has  a  rank  equal  to  i  (i.e.,  P  is  the  frequency  of  the  ith 
word);  k  and  y  are  the  empirically  defined  constants,  approxi- 
mately valid,  in  general,  for  all  ordinary  texts  in  European  lan- 
guages. 

Essentially,  Zipf's  law  assumes  that  once  one  has  determined 


"  Zipf's  law  is  introduced  here  in  a  form  made  precise  by  subsequent  research 
(see  [15]  and  [33]). 
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the  constants  k  and  y,  one  can  calculate  the  frequency  of  a  word 
from  its  rank  (and  conversely)  by  using  the  formula,  and  can 
also  solve  various  problems  for  which  one  must  know  ranks 
and  corresponding  frequencies. 

Thus,  if  it  is  always  possible  to  find  a  word's  frequency,  Pxj, 
Pz,  .  .  .  ,  Pm,  then  it  is  always  possible  to  find  the  total  frequency 
of  the  m  most  frequent  words,  i.e.,  the  percentage  of  text  cov- 
ered by  a  certain  number  of  the  most  frequent  words: 

m  m 

2  P.  =  2  kr-y. 

If  we  assume  that  A  =  0.1,  y  =  1.01,  and  m  =  1,100,  then  we 
obtain  2"*r=i  Pr  =  0.8,  i.e.,  the  1,100  most  frequently  used  words 
constitute  80  per  cent  of  the  text  occurrences.  We  can  show 
several  other  problems  whose  solutions  are  based  on  the  appli- 
cation of  Zipf's  formula  (  [10],  [39]  ).  The  unsatisfactory  as- 
pect of  Zipf's  relation  consists  in  the  fact  that  the  constants  k 
and  Y  are  far  from  being  as  "general"  as  Zipf  asserted.  But  this 
is  too  specific  a  question,  and  we  shall  not  discuss  it  here. 

4.2.  Word-Distribution  by  Number  of  Syllables  in 
Various  Languages 

For  many  languages,  the  number  of  syllables  in  a  randomly 
selected  word  is  a  random  variable;  there  are  one-,  two-,  three- 
syllable  words,  etc.  One  can  assert  that  in  texts  in  any  given 
language,  there  is  a  preponderance  of  words  with  a  particu- 
lar number  of  syllables;  for  example,  there  are  approximately 
twice  as  many  bisyllabic  as  monosyllabic  words  in  language  A, 
and  as  many  three-  as  four-syllable  words,  etc.,  while  another 
ratio  exists  in  language  B,  and  so  on. 

The  Soviet  scientist  S.  G.  Chebanov  [12]  (in  1947)  and  the 
German  mathematician  W.  Fuchs  [11]  (in  1956)  found  that 
the  distribution  of  words  by  the  number  of  syllables  in  various 
languages  is  subject  to  a  certain  general  regularity.  We  shall 
present  here  the  main  idea  behind  the  discussions  in  Fuchs'  pa- 
per [11],  since  it  is  the  more  interesting  one  to  linguists  from 
the  standpoint  of  methodology. 

In  scanning  concrete  texts  in  English  and  Latin,  Fuchs  found 
that  word  distribution  by  number  of  syllables  is  very  close  for 
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various  texts  in  one  language,  i.e.,  individual  differences  be- 
tween authors  writing  in  the  same  language  are  insignificant; 
but  for  different  languages,  the  data  are  quite  different. 

These  differences  can  be  illustrated  as  in  the  graph  in  Figure 
6.  The  number  of  syllables  possible  in  each  word  in  a  given 


Pi 


Figure  6.     Number  of  Words  of  /  Syllables  as  a  Percentage 
of  the  Total  Number  of  Words  in  Text. 


language  is  plotted  along  the  x-axis.  The  number  of  words  hav- 
ing a  certain  number  of  syllables  is  plotted  along  y,  as  a  per- 
centage of  the  total  number  of  words  in  the  text,  i.e.,  the  prob- 
ability of  a  randomly  selected  word  from  the  text  being  mono- 
syllabic, bisyllabic,  trisyllabic,  etc.  For  example,  the  curve  for 
Shakespeare  shows  that  about  80  per  cent  of  the  words  in  the  text 
are  monosyllabic,  and  less  than  20  per  cent  are  bisyllabic;  the 
curve  for  Sallust  shows  that  the  largest  group  of  words— about 
32  per  cent— is  bisyllabic,  with  somewhat  fewer  trisyllabic,  etc. 
It  is  quite  apparent  how  closely  similar  are  the  probability-dis- 
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tribution  curves  for  various  authors  o£  English  texts,  and  how 
much  the  curves  for  Caesar's  and  Sallust's  texts  differ  from  them. 
However,  the  curves  for  various  languages  had  other  features 
in  common,  as  well.  The  character  of  the  experimentally  ob- 
tained curves  has  allowed  the  author,  building  on  certain  state- 
ments in  probability  theory,  to  present  the  hypothesis  that  in 
all  languages  studied,  word  distribution  by  syllables  follows  a 
definite,  general  law.  On  the  basis  of  this  hypothesis,  Fuchs  ob- 
tained an  analytic  expression  by  means  of  which  one  can  calcu- 
late, knowing  only  the  average  number  of  syllables  in  words  in 
a  given  language,  what  percentage  of  the  text  is  monosyllabic, 
bisyllabic,  or  other: 

e-(i-i)(i  -  i)i-i 

^'=     ii-iv.     ■ 

Here,  i  is  the  number  of  syllables  in  a  word;  t  is  the  average 
number  of  syllables  per  word  for  a  certain  language;  e  is  a 
constant  (the  base  of  natural  logarithms)  equal  to  2.718  .  .  .  ;  pi 
is  the  percentage  of  ?-complex  words  in  the  text,  i.e.,  the  prob- 
ability that  a  word  randomly  selected  from  text  in  a  particular 
language  has  i  syllables. 

Let  us  take  an  example.  In  order  to  know  the  proportion  of 
/-complex  (/  =  1,  2,  3,  .  .  .  )  words  in  Russian  text,  according 
to  Fuchs,  it  is  enough  to  know  only  one  parameter:  the  average 
value  of  the  number  of  syllables  per  word.  This  is  calculated  by 
a  very  simple  experiment:  For  every  word  of  a  certain  text,  the 
number  of  syllables  is  written,  and  these  figures  are  put  to- 
gether and  the  sum  divided  by  the  total  number  of  words  in 
the  text.^^  For  Russian,  the  average  number  of  syllables  is  2.228 
(Fuchs'  data).  Then,  in  order  to  find,  for  example,  the  propor- 
tion of  five-syllable  words  in  Russian  text,  we  substitute  the  cor- 
responding values  in  Fuchs'  formula.  Thus: 

2.718-(2-228-i)(2.228  -  1)^ 

pU  =  5    = ^ =  0.0358. 

4! 

This  means  that  in  Russian  text,  3.58  per  cent  of  the  words 
have  five  syllables. 


^  One  must  bear  in  mind  that  in  order  to  obtain  sufficiently  precise  data,  it 
is  necessary  to  study  a  considerable  amount  of  text  (see  above  regarding  this). 
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A  comparison  of  the  data  obtained  by  calculations  from  the 
text  and  the  formula  yields  good  results. 

A  condition  for  the  application  of  Fuchs'  formula  is  the  de- 
termination of  the  mean  number  of  syllables  per  word  for  each 
language  specified.  But  this  does  not  lessen  the  practical  value 
of  the  formula  in  the  least,  since  calculation  of  the  average 
number  of  syllables  is  incomparably  less  laborious  than  calcu- 
lation of  the  entire  distribution  of  words  by  number  of  syllables 
for  various  languages.  In  addition,  the  analytic  relationship  fa- 
cilitates a  comparative  analysis  of  word  distribution  by  syllables 
in  various  languages. 

4.3.  Application  of  Statistical  Methods  for  Studying  the 
Structure  of  Poetry 

Attempts  to  apply  statistical  methods  to  a  study  of  the  struc- 
ture of  poetry  have  already  been  made  by  Andrej  Belyj  [2]. 
The  studies  of  the  famous  Soviet  linguist  and  textologist  B.  V. 
Tomashevsky  remain  an  unsurpassable  example  of  this,  as  col- 
lected in  his  book  Poetry  [9].  In  order  to  acquaint  the  reader 
with  the  methodology  of  Tomashevsky's  statistical  research,  we 
shall  present  here  several  constructions  from  his  paper  on  The 
Iambic  Pentameter  of  Pushkin. 

As  is  known,  iambic  verse,  especially  that  of  Pushkin,  can 
have  a  very  widely  varied  rhythmical  structure,  determined  to 
a  high  degree  by  the  distribution  of  pyrrhic  feet— pairs  of  syl- 
lables without  stress.  For  example,  in  iambic  pentameter,  in 
addition  to  the  canonical  type,  such  as 

Gorit  vostok  zareyu  novoj, 

the  most  diverse  variants  are  encountered:   with  the  pyrrhic 
for  the  first  foot,  as  in 

Ne  dokuchal  moraVyu  strogoj, 

or  with  the  pyrrhic  for  the  second  and  third  feet,  as  in 
/  klanyalsya  neprinuzhdenno, 

and  others. 
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The  greater  the  number  of  feet  in  the  iambic  line,  the  more 
possibilities  there  are  for  combining  pyrrhic  feet  with  other 
kinds  of  feet.  The  only  distinct  syllable  with  a  firmly  fixed  stress 
is  the  ultimate  syllable,  since  it  is  the  rhyming  one. 

In  attempting  to  give  a  general  description  of  the  peculiari- 
ties of  the  iambs  of  Pushkin,  B.  V.  Tomashevsky  conducted 
broad  statistical  research,  from  which  he  found  that  between 
the  number  of  feet  (pairs  of  syllables)  in  the  line  (x)  and  the 
average  number  of  pyrrhic  feet  per  line  of  poetry  (y)  there  ex- 
ists a  fully  determined  strict  relationship: 

y  =  0.28(x  -  1). 

This  means  that  each  hundred  lines  of  iambic  tetrameter  will 
contain  an  average  of  100  (0.28)  (4  —  1)  pyrrhic  feet,  while  each 
100  lines  of  iambic  pentameter  will  contain  100(0.28)  (5  —  1) 
pyrrhic  feet,  etc.;  i.e.,  the  number  of  pyrrhic  feet  is  proportional 
to  the  number  of  feet  per  line,  minus  the  rhyming  syllable, 
since  the  latter  does  not  participate  in  the  distribution  of  pyr- 
rhic feet. 

The  formula  y  =  0.28  (x  —  1)  agrees  well  with  the  experi- 
mental data  for  iambic  di-,  tetra-,  penta-,  and  hexameter,  and 
only  slightly  less  well  with  trimeter. 

But  the  rhythmic  variety  of  iambic  lines  of  more  than  two 
feet  is  determined  not  only  by  the  number  of  pyrrhics  but  also 
by  their  location,  i.e.,  where  they  occur  in  the  line.  Thus,  in 
iambic  trimeter  one  finds  lines  without  pyrrhics,  lines  with  a 
pyrrhic  foot  at  the  first  foot  (the  first  pair  of  syllables),  with  a 
pyrrhic  for  the  second  foot  (the  second  pair  of  syllables),  and 
with  two  pyrrhic  feet— on  the  first  and  second  pairs  of  syllables. 
Therefore,  it  seems  reasonable  to  describe  the  iambic  trimeter 
of  a  certain  poet  by  indicating  how  frequently  the  correspond- 
ing line  forms  are  encountered— in  other  words,  on  the  basis  of 
a  statistical  approach. 

For  example,  the  rhythmic  structure  of  Pushkin's  iambic  trim- 
eter may  be  characterized  in  the  following  manner. 

For  each  100  lines  of  iambic  trimeter,  there  are  300  distinct 
feet  that  could  be  stressed,  100  of  which  are  rhyming;  as  we 
have  already  said,  these  are  always  stressed.  This  means  that 
200  feet  participate  in  the  distribution  of  pyrrhics.  There  are, 
then,  200  possible  places.  Using  B.  V.  Tomashevsky's  formula, 
we  can  calculate  how  many  of  these  will  have  pyrrhic  feet:  y  = 
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100(0.28)  (3  -  1)  =  56.  Thus,  out  of  200  places,  56  are  pyr- 
rhic,  and  the  others  are  stressed.  It  remains  to  be  determined 
at  which  places  they  occur.  One  cannot  tell  this  a  priori.  Toma- 
shevsky's  statistical  research  has  shown  that  out  of  144  stresses 
falling  on  the  first  and  second  feet,  about  40  occurred  on  the 
second  foot,  and  the  others  on  the  first.  This  means  that  most 
lines  have  the  pyrrhic  for  the  second  foot,  while  lines  with  a 
pyrrhic  for  the  first  foot,  or  with  two  pyrrhic  feet,  are  quite 
rare.^^ 

We  will  now  cite  several  more  complex  discussions,  in  which 
the  methods  of  probability  theory  are  applied  in  studying  the 
rhythmic  structure  of  iambic  caesural  pentameter. 

Iambic  caesural  pentameter  is  divided  by  the  caesura  into 
two  half-verses,  the  first  being  like  iambic  dimeter,  the  second 
like  trimeter.  Can  one  deduce  the  regularities  of  distribution 
of  pyrrhic  feet  for  iambic  caesural  pentameter  from  the  regu- 
larities for  dimeter  and  trimeter?  Or  does  iambic  pentameter 
have  its  own  unique  structure? 

We  shall  consider  the  first  half-verse  (see  Table  2,  column  1). 
Obviously,  it  cannot  be  equated  with  iambic  dimeter,  since  the 
latter  cannot  have  the  pyrrhic  for  the  second  foot  (the  rhyming 
one),  whereas  in  the  first  half-verse  of  iambic  pentameter,  there 
is  nothing  to  stop  the  pyrrhic  from  being  the  second  foot. 

Since  the  form  --/--/  is  impossible  for  the  first  half- 
verse,  because  by  the  rules  of  poetry  at  least  one  stressed  word 
precedes  the  caesura,  the  following  three  types  occur  for  the 
first  half- verse: 

(1)  -V  -V  (2)  w  -V  (3)  -V  -/ 

We  shall  consider  the  second  half-verse.  From  the  standpoint 
of  the  location  of  pyrrhic  feet,  it  is  fully  analogous  to  iambic 
trimeter:  The  same  four  variants  in  lines  are  possible  as  in 
iambic  trimeter: 

(1)  ^7  ^7  ^7  (3)  u7  V.W/  ^7 

(2)  ww/  -7  -7  (4)  ^^/  uw/  ^'/ 


•^  The  example  for  iambic  trimeter  was  selected  for  simplicity.  It  was  noted 
above  that  the  formula  y  =  058(x  —  1),  for  iambic  trimeter,  does  not  agree  very 
well  with  experimental  data.  More  precise  data  for  iambic  trimeter  will  be  given 
below. 
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TABLE  2 

The  Distribution  of  Stresses  in  Iambic  Caesural  Pentameter 
(according  to  B.  V.  Tomashevsky) 


1 

2 

3 

4 

Rhythmic  scheme  of  each 
half-verse  type 

Frequency 

of  this 

type 

Theoretical 
probability 
of  the  line 

Experimental 
frequency 
of  the  line 

V 
> 

Vskrichal  Odurj 
Iz  glubiny 
Krasavitsa 

^0.57 
/)20.15 
/^3  0.28 

Ml  0.206 
Ml  0.054 
Ml  0.101 
M2  0.026 
Ms  0.338 

M2  0.006 
Ms  0.083 
M2  0.013 
Ms  0 . 1 66 
M4  0.006 
M4  0.001 
M4  0.003 

0.208 
0.057 
0.097 
0.028 
0.335 

0.007 
0.086 
0.010 
0.173 
0.003 
0.003 
0.003 

V 

M 

i-, 
> 

G 
CM 

Skazal  mudrets  bradatyj 
PervonachaVny  nravy 
Volshebnoyu  krasoj 

'U       '/  '-'  '-'/   ^  '/ 

Bez  predugotovlen'ya 

yi  0.361 
^2  0.045 
^3  0.591 
^4 0.001 

Total  types  12 

Combining  the  three  forms  of  the  first  half-verse  with  the 
four  types  of  the  second  yields  twelve  theoretically  possible 
kinds  of  lines  for  iambic  caesural  pentameter.  Here,  the  choice 
of  the  first  half-verse  in  no  way  influences  the  choice  of  a  second. 
Therefore,  according  to  the  rules  of  probability  theory,  the  fre- 
quency of  each  of  the  twelve  theoretically  possible  types  of  lines 
must  be  equal  to  the  product  of  the  frequencies  predicted  for 
those  half-verses,  of  which  a  certain  type  of  line  is  composed.  If 
the  frequency  of  a  certain  kind  of  line  is  not  the  product  of  the 
predicted  frequencies  of  its  component  half-verses,  this  means 
that  they  are  not  independent  and  that  the  choice  of  the  first  is 
somehow  connected  with  the  choice  of  the  second. 

Tomashevsky's  experiment  (see  Table  2)  testifies  to  the  in- 
dependence of  the  various  kinds  of  half-verses.  In  column  I,  the 
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rhythmic  schemes  of  the  half-verse  types  are  listed.  Column  2 
shows  the  experimental  frequencies  of  the  types  of  first  half- 
verse  (designated  p  with  the  appropriate  indices)  and  of  the 
second  half-verse  (q,  with  appropriate  indices).  In  column  3, 
theoretical  probabilities  are  written  for  the  combination  of  the 
first  half-verse  with  the  second  (twelve  in  all),  as  calculated  on 
the  assumption  that  half-verse  types  are  independent,  i.e.,  by 
cross-multiplication  of  the  probabilities  for  the  corresponding 
kinds  of  half-verses;  and  in  column  4  are  written  the  experi- 
mental probabilities  of  the  same  combinations.  A  comparison  of 
experimental  with  theoretical  data  does  not  contradict  the 
hypothesis  of  independence.  Essentially,  this  means  that  the 
second  half-verse  of  iambic  pentameter  is  constructed  as  an  inde- 
pendent verse— an  iambic  trimeter— and  its  form  is  in  no  way 
connected  with  the  form  of  the  first  half-verse. 

Thus,  in  studying  the  distribution  of  pyrrhic  feet  in  iambic 
caesural  pentameter,  it  is  sufficient  to  study  the  distribution  of 
pyrrhic  feet  in  the  first  half-verse,  while  for  the  second  half- 
verse,  one  can  use  the  data  for  iambic  trimeter.  Tomashevsky's 
application  of  the  rule  for  calculating  the  probability  of  simul- 
taneous occurrence  of  independent  events  made  it  possible  to 
simplify  the  work  considerably.  The  formula  (see  p.  110)  and 
statistical  data  on  the  distribution  of  pyrrhic  feet  in  various 
iambs  make  it  possible  to  calculate  mean  values  for  the  number 
and  location  of  stresses  for  any  iambic  meter.  Having  obtained 
such  data  for  Pushkin  and  other  poets,  we  can  construct  com- 
parable tables  from  which  one  can  judge  the  individual  pecu- 
liarities of  the  iambic  meter  of  various  poets.^° 

Since  the  distribution  of  metric  variants  even  within  an  iamb 
with  a  certain  number  of  feet  is  twice  as  individualized  for  the 
great  poets,  such  tables  can  be  used  also  for  heuristics. 

We  have  discussed  only  a  few  examples  of  the  application  of 


^^  Here  we  take  into  account,  of  course,  not  only  the  distribution  of  pyrrhic 
feet  but  also  the  distribution  of  caesuras  and  several  other  characteristics.  The 
tables  mentioned  are  presented  in  B.  V.  Tomashevsky's  study,  "The  Iambic  Pen- 
tameter of  Pushkin,"  in  Pushkin:  sovremennye  problemy  istoriko-literaturnogo 
izucheniya  [Pushkin:  Contemporary  Problems  in  Historical-Literary  Study], 
Kurturno-prosvetitel'noe  trudovoe  tovarishchestvo  "Obrazovanie,"  Leningrad, 
1925. 
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statistical  methods  to  the  description  of  individual  language 
facts.  It  seems  to  be  generally  true  for  the  studies  we  have  con- 
sidered that  by  treating  certain  language  phenomena  as  random 
(in  the  technical  sense)  and  by  studying  them  with  statistical 
methods,  the  authors  have  discovered  several  regularities  that 
could  not  have  been  found  by  another  approach. 

Choice  of  examples  was  dictated  by  the  necessity  of  avoiding 
the  use  of  the  complex  mathematical  apparatus  of  probability 
theory  in  explaining  them.  Those  desirous  of  becoming  more 
familiar  with  the  application  of  statistical  methods  in  studying 
various  language  phenomena  may  refer  to  the  following  in  ad- 
dition to  studies  referred  to  above:  L.  R.  Zinder  [4],  S.  Saporta 
and  D.  Olson  [36],  J.  B.  Carroll  [16],  C.  E.  Shannon  [37].  See 
also  [20],  [32]  (phonetics,  phonology,  the  statistics  of  letters 
and  groups  of  letters);  the  works  of  P.  Guiraud  ( [24],  [25] ); 
N.  R.  French,  C.  W.  Carter,  Jr.,  and  W.  Koenig,  Jr.  [21];  V. 
Garcia  Hoz  [22];  H.  A.  Simon  [38];  V.  H.  Yngve  [46];  and 
others— [23],  [27],  [28]  (lexicology,  syntax,  questions  of  style). 
Also  interesting  are  the  articles  of  C.  B.  Williams  [45]  and  G.  U. 
Yule  [47]  on  the  length  of  the  predicate  in  the  works  of  various 
authors.  Interesting  materials  are  contained  in  a  collection  of 
articles  on  speech  statistics  [4],  in  the  Theses  of  the  First  All- 
Union  Conference  on  Machine  Translation  [7]  and  in  those  of 
the  Conference  on  Mathematical  Linguistics  [8],  and  also  in 
the  bulletin  Mashinnyj  perevod  i  prikladnaya  lingvistika  [Ma- 
chine Translation  and  Applied  Linguistics]. 

For  an  introduction  to  probability  theory  and  mathematical 
statistics,  we  would  recommend  E.  S.  Venttsel's  work  [3]  and 
the  introduction  to  N.  Arley  and  K.  R.  Buch's  book  [1];  also 
A.  M.  Yaglom  and  I.  M.  Yaglom's  book,  Veroyatnost'  i  infor- 
matsiya  [Probability  and  Information]  [14].  All  linguists 
should  read  a  study  by  the  outstanding  Russian  mathematician 
A.  A.  Markov,  "Ob  odnom  primenenii  statisticheskogo  metoda" 
["An  Application  of  Statistical  Method"]  [6],  in  which  the  ne- 
cessity of  evaluating  reliability  in  linguistic  research  was  first 
demonstrated. 
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CHAPTER  VI 


Information  Theory  and  the 
Study  of  Language 


1.  General  Remarks 

For  a  scientific  description  of  language  as  a  means  of  communi- 
cation, of  language  in  its  essential,  communicative  function, 
language  must  be  compared  with  other  systems  used  for  the 
transmission  of  information.  The  problem  was  long  ago  pre- 
sented in  this  manner  by  De  Saussure,  who  asserted  that  lan- 
guage must  be  studied  within  the  framework  of  a  science  of 
symbolic  systems— "general  semeiology"  (see  bibliography  to 
Chapters  I  and  II,  [9],  p.  30).  The  study  of  this  aspect  of  lan- 
guage can  now  be  put  on  an  exact  mathematical  basis,  thanks 
in  part  to  the  methods  of  information  theory. 

The  need  for  studying  language  by  the  methods  of  informa- 
tion theory  is  based  on  practical  necessity.  Language  is  one  of 
the  most  important  means  of  transmitting  information  in  hu- 
man society,  and  only  through  the  use  of  information  theory 
can  one  obtain  the  data  about  language  that  are  needed  for 
working  out  effective  means  of  transmitting  language  informa- 
tion by  telegraph,  telephone,  radio,  etc.  ([3],  [4],  [11]). 

The  first  attempts  to  study  language  by  the  methods  of  infor- 
mation theory  led  to  inflated  accounts  of  the  wide  perspectives 
that  this  theory  opens  to  linguistics  (see,  for  example,  [26]  ). 
Further  research  has  led  to  a  more  sober  evaluation  of  the  ac- 
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tual  possibilities  (  [29],  [19]  ).  Information  theory  forces  one 
to  consider  language  as  a  code,  and  permits  several  properties 
of  this  code  to  be  studied,  using  statistical  models.  One  can 
form  judgments  regarding  the  fruitfulness  of  statistical  models 
of  language  based  on  the  concepts  of  information  theory,  these 
judgments  being  governed  by  the  degree  to  which  the  models 
can  be  applied  in  solving  "linguistic  problems  proper,"  i.e., 
problems  that  have  been  brought  to  light  in  the  course  of  the 
past  development  of  linguistics.  But  one  must  not  forget  that 
practical  application  of  linguistics  presents  new  problems  of  un- 
doubted importance. 

Before  speaking  of  the  linguistic  applications  of  information 
theory,  we  consider  it  necessary  to  present  several  of  its  concepts 
that  are  especially  essential  for  linguistics  (see  also  the  most 
popular  existing  presentations  of  information  theory— [5],  [10], 

[11])- 


2.  The  Concept  of  "Quantity  of  Information" 

Information  theory  is  an  outgrowth  of  mathematics  rather 
closely  related  to  statistics.  The  foundations  for  information 
theory  were  laid  in  1948  by  the  American  scientist  C.  E.  Shan- 
non [37].  The  theory  has  developed  rapidly  and  successfully 
in  the  last  few  years.  Information  theory  deals  with  the  proc- 
esses of  transmission  of  information  through  communications 
systems.  Related  to  communications  systems  are  not  only  such 
technical  apparatus  as  the  telephone,  the  telegraph,  and  radio, 
but  also  language  communication,  the  mechanism  of  reaction 
by  an  animal  to  an  external  stimulus,  systems  for  traffic  control, 
etc.  We  can  represent  any  communication  system  schematically 
as  in  Figure  7. 

Whatever  is  transmitted  over  a  communication  system  is 
called  a  message.  A  communication  system  consists  of  a  source 
initiating  the  message,  a  transmitter  transforming  the  message 
into  signals  that  can  be  transmitted  along  the  communication 
channel,  a  receiver  recreating  the  message  from  a  sequence  of 
signals,  and  a  recipient.  In  any  communications  system,  noise 
usually  acts  on  the  signal.  By  noise  we  mean  any  kind  of  ran- 
dom error  arising  unavoidably  during  the  transmission  of  a  sig- 
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nal.  This  is  the  idealized  scheme.  All  of  its  parts  are  present  in 
one  form  or  another  in  any  act  of  communication. 

The  purpose  of  a  communication  system  is  to  transmit  infor- 
mation. The  concept  of  information  is  primitive,  undefined. 
The  decisive  notion  in  the  creation  of  the  theory  was  the  con- 
cept "quantity  of  information,"  because  this  provided  a  means 
for  establishing  a  quantitative  comparison  of  the  "essential" 
properties  of  characteristically  heterogeneous  messages. 

The  concept  of  "information"  is  usually  associated  with  an 
idea  about  some  substantial  communication  of  important  news 
to  the  receiver-recipient.  But  information  theory  does  not  treat 
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Figure  7.  Schematic  Representation  of  a  Communication  System. 


the  quantity  of  information  in  a  message  as  a  function  of  its 
substance  or  its  importance.  For  this  reason,  information  theory 
proposes  an  approach  to  communication,  and  to  the  informa- 
tion contained  in  it,  that  seems  unnatural  at  first  glance.  It  will 
be  easier  to  understand  the  point  of  view  of  this  theory  if  we 
note  that  information  theory,  including  all  its  present  applica- 
tions, was  at  first  closely  bound  up  with  the  practical  needs  of 
engineers  who  were  developing  communication  systems.  From 
the  communication  engineer's  point  of  view,  the  most  essential 
attributes  of  a  message  are  the  time  needed  to  transmit  it,  the 
complexity  of  the  apparatus  transmitting  the  signal,  etc. 
Whether  the  message  itself  communicates  a  brilliant  scientific 
discovery  or  asserts  that  2X2  =  4,  whether  it  warns  of  an  emer- 
gency in  a  factory  or  forecasts  the  weather— the  problem  con- 
sists of  transmitting  the  message  in  the  most  rapid  and  accurate 
manner  to  the  recipient-receiver.  The  content  of  a  message,  its 
reliability,  importance,  etc.,  are  not  essential.  The  only  prop- 
erty of  a  message  that,  from  this  point  of  view,  is  essential  is  its 
statistical  structure.  We  shall  explain  this  further. 
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If  we  abstract  the  content  and  qualitative  facets  of  a  message, 
then  all  messages  resemble  one  another;  they  all  represent  some 
sequence  or  other  of  symbols.^  Such,  for  example,  are  language 
messages,  telegraphed  messages  in  Morse  code,  etc.  The  sym- 
bols composing  a  message  form  a  finite  set— the  alphabet. 

In  any  meaningful  message,  various  symbols  are  encountered 
with  various  frequencies,  so  that  the  relative  frequency  of  ap- 
pearance of  symbols  in  various  sections  of  homogeneous  mes- 
sages is  more  or  less  constant.  This  permits  one  to  consider  the 
occurrence  frequency  of  a  symbol  in  a  message  to  be  a  definite 
characteristic  of  this  symbol,  or  an  inherent  property  of  it.  This 
characteristic  is  called  a  probability^  (the  probability  of  appear- 
ance of  a  symbol  i  is  designated  p  (i)  ). 

Another  important  property  of  the  message  is  that  various 
groups  of  symbols  also  occur  in  it  with  different  frequencies; 
thus,  for  example,  if  the  message  is  a  written  text  in  Russian, 
the  group  of  letters  i'  will  occur  quite  frequently,  while  the 
combination  o'  is  practically  excluded.  This  means  that  in  ad- 
dition to  its  probability,  symbol  i  possesses  a  conditional  prob- 
ability,^ i.e.,  the  probability  of  its  occurrence  in  a  message  un- 
der the  condition  that  the  preceding  symbol  be  some  /  (desig- 
nated pj(i)).  The  probabilities  and  conditional  probabilities 
of  symbols  determine  the  statistical  structure  of  the  message.^ 

If  the  message  interests  us  only  from  the  standpoint  of  its  sta- 


^  Regarding  messages  that  seem  to  have  another  nature,  see  p.  132. 

^  For  the  sake  of  simplicity,  one  can  say  that  a  symbol's  probability  is  its  rela- 
tive frequency  in  a  rather  lengthy  message  (actually,  such  a  representation  of 
probability  is  a  very  rough  one;  in  the  bibliography  for  Chapter  V,  see  [5],  a 
popularized  version  of  probability  theory).  For  example,  let  a  message  consist  of 
a  dot-dash  sequence.  A  section  of  it  1,000  symbols  long  is  studied.  In  this  sample, 
the  dot  occurs  750  times,  and  the  dash  occurs  250  times.  Then  one  can  say  that 
the  probability  of  the  dot  is  0.75  and  that  of  the  dash  is  0.25.  Thus,  the  proba- 
bility is  always  expressed  by  a  proper  fraction.  We  note  that  the  sum  of  the 
probabilities  of  all  symbols  that  can  occur  on  a  single  trial  is  always  equal  to 
unity.  If  the  probabilities  of  all  symbols  are  the  same,  then  that  for  each  symbol 
equals  l/n,  where  n  is  the  total  number  of  symbols. 

'  Regarding  the  concept  of  "conditional  probability,"  see  [5].  Conditional 
probability  can  be  calculated  from  the  simple  probability  of  individual  symbols 
and  the  probability  of  combination  of  symbols  by  the  following  formula:  pj(i)  =z 
p(j,  i)/p(j),  where  p(j,  i)  is  the  combinatorial  probability  of  the  symbols  (/,  i). 

*  Compare  the  concept  of  the  "statistical  structure  of  text"  on  p.  97,  which  has 
a  slightly  different  meaning. 
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tistical  structure,  then  the  process  of  creating  a  message  can  be 
represented  as  follows.  Let  us  assume  that  a  series  of  trials  is 
being  made.  The  outcome  of  each  trial  consists  of  the  appear- 
ance of  one  of  the  symbols  of  the  alphabet,  so  that  the  occur- 
rence frequency  of  various  symbols  in  repeated  trials  is  deter- 
mined by  the  probability  of  the  symbols.  The  sequence  of  sym- 
bols appearing  during  this  process  is  the  message.  The  model 
described  represents  the  creation  of  a  message  (i.e.,  communi- 
cation) as  a  probabilistic  process.  Thus,  the  mechanism  for  cre- 
ating a  message  is  analogous,  for  example,  to  an  imaginary 
mechanism  that  casts  dice  and  writes  down  the  results  of  each 
cast  (i.e.,  the  numbers  from  I  to  6),  combining  them  into  infi- 
nite sequences.  Although  we  use  a  probabilistic  model  to  de- 
scribe the  process  of  message  production,  we  can  hardly  suppose 
that  real  messages  are  formed  in  just  this  way.  The  model  does 
not  pretend  to  describe  the  real  process.  It  is  important  only 
that  the  statistical  structure  of  real  messages  be  exactly  the 
same  as  it  would  be  if  they  were  created  by  the  mechanism  de- 
scribed. 

The  reason  that  a  knowledge  of  the  statistical  structure  of 
messages  is  so  important  for  their  transmission  over  communi- 
cations systems  will  become  fully  clear  only  from  further  dis- 
cussion (see  Sec.  4.2);  for  now,  we  shall  confine  our  discussion 
to  just  a  short  preliminary  explanation.  If  one  is  transmitting 
a  message  in  which  certain  words  or  phrases  are  repeated  par- 
ticularly often,  then  it  is  desirable  to  tell  the  receiver-recipient 
of  these  messages  to  write  the  words  down  in  some  abbreviated 
form.  However,  such  abbreviation  is  possible  only  if  one  knows 
just  how  frequently  the  different  words  and  phrases  occur. 

If  the  probability  of  occurrence  of  a  symbol  at  a  certain  point 
in  the  message  does  not  depend  on  what  symbol  or  symbols  pre- 
ceded it,  then  one  can  say  that  the  symbols  in  a  message  are  in- 
dependent. An  example  of  a  message  that  consists  of  independ- 
ent symbols  might  be  the  "communication"  created  by  casting 
dice.  Among  messages  consisting  of  dependent  symbols,  some 
may  be  initiated  by  a  particular  type  of  probabilistic  process, 
namely,  a  Markov  process  (or  Markov  chain).  In  such  a  mes- 
sage, the  probability  that  a  certain  symbol  i  will  occur  depends 
on  the  nature  of  the  immediately  preceding  symbol  ;,,  and  not 
on  that  of  any  part  of  the  message  preceding  j.  If  the  depend- 
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ency  between  symbols  extends  further,  to  a  longer  chain  pre- 
ceding the  present  symbol,  then  one  speaks  of  a  complex  Markov 
process;  in  any  Markov  process,  however,  the  number  of  sym- 
bols influencing  the  probability  of  occurrence  of  a  particular 
symbol  must  be  finite. 

Information  theory  develops  the  means  for  quantitatively 
comparing  different  messages  from  the  standpoint  of  the  indi- 
cated statistical  properties.  For  this  purpose,  the  concept  of  a 
"quantity  of  information  in  the  communicative  symbol"  is  also 
introduced.  At  first,  we  shall  deal  only  with  messages  consisting 
of  sequences  of  independent  symbols.  Let  us  consider  the  proc- 
ess whereby  an  addressee  receives  a  message.  Since  we  are  as- 
suming a  situation  wherein  more  or  less  similar  messages  are 
transmitted  frequently,  it  is  natural  to  believe  that,  even  before 
the  next  symbol  in  the  message  is  transmitted,  the  receiver 
knows  the  set  of  possible  symbols  (the  alphabet)  and  the  prob- 
ability of  each  symbol.  If  the  total  number  of  symbols  in  the 
alphabet  is  small  (e.g.,  two),  and  their  probabilities  are  quite 
different  (e.g.,  if  one  symbol's  probability  is  0.99  and  that  of 
the  other  is  0.01),  then  the  situation  before  reception  of  the 
symbol  contains  almost  no  indeterminacy  for  the  recipient;  he 
can  predict  in  advance  and  with  a  high  degree  of  accuracy  which 
of  the  symbols  will  be  transmitted.  The  indeterminacy  of  the 
situation  increases  as  the  number  of  symbols  in  the  alphabet  in- 
creases. Where  the  number  of  symbols  is  fixed,  it  is  greatest  if 
their  frequencies  are  equal.  Thus,  the  degree  of  indeterminacy 
before  reception  of  a  symbol  depends  on  the  number  of  possi- 
ble symbols  and  on  their  probabilities.  As  Shannon  has  shown, 
it  is  useful  to  assume  that  the  degree  of  the  indeterminacy  is 
connected  with  the  probability  of  the  symbols  by  the  following 
equation: 

H^=  -  [p(l)  log2  p(l)  +  p{2)  log2  p(2)  +  •  •  •  +  V(n)  log2  p{n)], 

where  H^  is  the  degree,  or  value,  of  indeterminacy,  n  is  the  num- 
ber of  symbols  in  the  alphabet,  and  p{\),  p{2),  .  .  .  ,  p(n)  are 
the  probabilities  of  symbols  from  the  1st  to  the  nth.  Once  a  sym- 
bol has  been  received,  the  indeterminacy  is  completely  elimi- 
nated. Therefore,  it  is  natural  to  think  that  the  degree  of  inde- 
terminacy characterizing  a  situation  before  reception  of  the 
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next  symbol  is  a  measure  of  the  quantity  of  information  in  the 
symbol  received.  Consequently,  one  can  call  the  value  of  Hx  a 
quantity  of  information,  which  can  be  measured  by  the  for- 
mula cited  above.  If  we  call  the  probability  of  a  symbol  p(i}, 
where  i  takes  the  values  \,2,  .  .  .,  n,  then  this  expression  can  be 
written    in   an   abbreviated  form,   using  the  summation   sign 

Hi  =  -j:p(i)iog2P(i).  (1) 

1=1 

In  the  special  case  where  all  symbols  are  equally  probable 
(as  in  the  case  of  dice),  formula  (1)  has  a  simpler  form: 

Hi  =  log2  n.  (2) 

To  see  this,  let  the  number  of  symbols  in  the  alphabet  be  n; 
then  the  frequency  of  each  is  l/n^  and  formula  (1)  will  contain 
n  equal  components;  hence, 

1  1  1 

Hi  =  —  n  —  log2  —  =  -  log2  —  =  —  (log2 1  —  log2  n)  =  log2  n. 
n  n  n 

When  one  speaks  of  the  quantity  of  information  in  a  symbol 
from  a  message,  he  does  not  mean  the  information  in  the  con- 
crete symbol  (e.g.,  in  the  letter  a  of  a  message  written  in  the 
letters  of  the  Russian  alphabet).  From  the  standpoint  of  infor- 
mation theory,  such  a  question  as:  "What  information  is  con- 
tained in  the  letter  a?"  is  meaningless.  Only  the  information 
in  the  average  symbol  is  being  measured,  i.e.,  the  average  quan- 
tity of  information  in  one  symbol  as  it  is  frequently  iterated  in 
the  message;  in  fact,  since  the  amount  of  information  is  being 
compared  with  the  value  of  indeterminacy,  this  amount  really 
characterizes  the  situation  before  reception  of  the  symbol,  rather 
than  the  symbol  itself.^ 


^  Generally  speaking,  from  formula  (1),  one  can  derive  an  expression  for  the 
amount  of  information  in  a  particular  concrete  symbol  a^: 

fl-ai    =     -   log2p(ai)    =   l0g2— -— 

P(ai) 
(hence,  the  widespread  expression  "message  containing  little  information"  =:  "a 
message  possessing  high  probability").  However,  the  value  H«i  can  hardly  have 
any  substantial  application. 
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To  be  able  to  measure  the  value  of  the  information,  one  must 
have  not  only  a  method  of  measuring  (as  defined  by  formulas 
(1)  and  (2) ),  but  also  a  unit  of  measurement.  The  latter  is 
selected  arbitrarily  (analogously  to  the  way  in  which,  for  ex- 
ample, a  unit  for  measuring  length,  weight,  or  other  physical 
magnitudes  is  selected).  The  unit  of  quantity  of  information 
is  the  information  derived  from  a  symbol  in  a  message  if  the 
alphabet  consists  of  two  symbols  having  equal  probability;  this 
unit  is  called  a  binary  unit  (abbreviated  b.u.,  or  simply  u). 

The  value  defined  by  formula  (1)  possesses  several  impor- 
tant properties: 

(1)  The  value  of  Hi  is  always  positive  (the  probability  is  ex- 
pressed in  fractions,  and  the  logarithm  of  a  fraction  is  a  nega- 
tive number;  but  a  minus  sign  is  used  before  the  parenthesis), 

(2)  If  the  alphabet  consists  of  only  one  symbol,  then  H^  =  0. 
(If  there  can  only  be  one  result  for  any  trial,  then  its  probabil- 
ity is  1;  thus,  we  obtain  Hi  —  (l)(log2  1)  =  0.) 

(3)  For  a  fixed  number  of  symbols,  Hi  is  maximal  when  the 
probabilities  of  all  symbols  are  equal.  This  property  can  be 
illustrated  by  the  following  example.  Let  the  alphabet  consist 
of  four  characters.  If  the  probabilities  of  all  four  are  equal  (i.e., 
if  the  probability  of  each  symbol  is  0.25),  then  according  to 
formula  (2): 

Hi  =  log2  4  =  2w. 

If  the  probabilities  are  not  equal,  the  quantity  of  information 
for  the  same  number  of  symbols  will  be  less.  Thus,  for  the  dis- 
tribution of  probabilities  1/2,  1/4,  1/8,  and  1/8,  the  quantity 
of  information  per  symbol  equals:*' 

Hi=-  [(l/2)(-l)  +  (l/4)(-2)  +  (l/8)(-3)  +  (l/8)(-3)] 

=  1/2  +  1/2  +  3/8  -f  3/8  =  1.75  u. 

(4)  If  the  message  consists  of  a  sequence  of  independent 
symbols,  the  amount  of  information  in  a  pair  of  adjacent  sym- 
bols is  equal  to  the  sum  of  the  values  expressing  the  amount  of 
information  in  each  symbol.  Let  the  number  of  symbols  in  the 
alphabet  equal  N.  Let  us  assume,  for  the  sake  of  simplicity. 


*  For  simplicity  in  calculation,  one  can  use  a  table  of  p(i)  log,  p(i);  see  [II], 
pp.  305-308. 
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that  the  symbols  have  equal  probabilities,  so  that  the  quantity 
of  information  in  each  symbol  will  be  logs  N.  Since  the  symbols 
are  independent,  the  number  of  possible  different  pairs  is  N  •  N 
(each  of  the  N  symbols  in  the  first  place  may  be  combined  with 
any  of  the  N  symbols  in  the  second  place).  If  we  consider  a 
pair  of  symbols  as  one  symbol,  then  according  to  formula  (2), 
the  amount  of  information  in  the  pair  is  logs  N  •  N;  but  loga 
N  '  N  =  logs  N  +  loga  N.  Thus,  the  values  characterizing  the 
quantity  of  information  in  independent  symbols  can  be  ex- 
panded to  obtain  at  the  same  time  the  value  of  the  information 
contained  in  the  combination  of  symbols.  This  property  of  the 
measure  of  information  is  called  the  property  of  additivity.  If 
we  measured  the  information  not  by  the  logarithm  of  the  num- 
ber of  symbols  but,  for  example,  by  the  number  of  symbols, 
then  the  information  in  a  pair  of  independent  symbols  would 
equal  not  the  sum  but  the  product  of  the  information  in  each. 
Suppose  one  had  to  determine  the  amount  of  information  in 
not  just  one  symbol  but  in  a  combination  of  m  symbols.  The 
property  of  additivity  allows  us  to  define  the  quantity  of  infor- 
mation H^  in  a  combination  of  m  symbols  under  the  condition 
of  independence  of  the  symbols  by  the  following  simple  for- 
mula: 

H^  =  mHi.  (3) 

The  four  properties  enumerated  above  would  seem  to  agree 
with  our  intuitive  ideas  about  the  measurement  of  informa- 
tion: It  seems  natural  that  the  number  defining  the  amount  of 
information  should  be  positive,  that  the  quantity  of  information 
in  a  communication  consisting  of  repetition  of  the  same  single 
symbol  should  be  zero,'^  that  the  information  in  independent 
symbols  can  be  totaled.  In  our  presentation,  we  first  introduced 


'The  following  selection  from  L.  Carroll's  Alice  in  Wonderland  is  a  curious 
example  of  this  position.  Alice  got  lonesome  and  had  no  one  to  talk  to  but  her 
cat.  But  how  is  one  supposed  to  hold  a  conversation  with  a  cat?  ".  .  .  If  they 
would  only  purr  for  'yes,'  and  mew  for  'no,'  or  any  rule  of  that  sort,  ...  so  that 
one  could  keep  up  a  conversation!  But  how  can  you  talk  with  a  person  if  they 
always  say  the  same  thing?"  Information  theory  would  evaluate  the  quantity  of 
information  in  a  cat's  message  exactly  as  Alice  did. 

[The  quote  the  author  gives  is  incorrectly  attributed  to  "Alice  in  Wonder- 
land"; it  appears  in  chapter  12  of  Through  the  Looking-Glass  and  What  Alice 
Found  There.— Tr.^ 
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a  definition  of  the  quantity  of  information  and  then  showed 
how  our  intuitive  ideas  concerning  the  measurement  of  infor- 
mation were  reflected  by  it.  Obviously,  however,  the  agreement 
is  strictly  superficial;  the  properties  enumerated  corresponded 
to  demands  that  could  be  imposed  on  the  measurement  of  in- 
formation, as  reflected  in  formula  (1).  Thus,  the  requirement 
corresponding  to  property  (1)— the  positive  character— necessi- 
tates the  minus  sign;  the  requirement  emerging  from  property 
(4)— the  possibility  of  combining  information— causes  us  to  in- 
troduce logarithmic  measurement,^  etc. 

One  of  the  most  important  properties  of  the  expression  pro- 
posed by  Shannon  as  a  definition  of  the  concept  of  a  "quantity 
of  information"  consists  of  its  being  very  similar  to  the  expres- 
sion used  to  define  an  important  concept  in  thermodynamics— 
the  concept  of  "entropy."  The  difference  between  these  expres- 
sions lies  only  in  the  fact  that  in  the  expression  for  entropy  the 
minus  sign  is  not  present,  i.e.,  entropy  always  has  a  negative 
value.  This  similarity  in  the  expressions  is  not  accidental;  rather, 
it  reflects  the  internal  identity  of  these  concepts.  In  physics,  en- 
tropy is  usually  described  as  a  measure  of  the  random  character 
of  the  structure  of  a  physical  system  (for  example,  gas  con- 
tained in  a  sealed  vessel).  Another  very  important  approach  to 
understanding  physical  entropy  has  become  possible  (see  [40], 
p.  95,  notes,  and  [1],  pp.  16-18  and  200-212):  No  physical  sys- 
tem is  fully  defined;  while  we  can  determine  such  properties 
of  the  system  as  its  temperature  and  pressure,  we  cannot  define 
the  exact  location  and  velocity  of  all  the  molecules  composing 
the  system,  even  though  these  very  phenomena  are  what,  in  the 
final  analysis,  cause  both  the  temperature  and  the  pressure.  En- 
tropy can  be  thought  of  as  a  measure  of  insufficient  informa- 
tion, i.e.,  a  measure  of  the  inexactness  of  our  knowledge  con- 
cerning the  position  and  velocity  of  individual  molecules,  after 
all  possible  measurements  have  been  made. 

Because  of  the  similarity  of  the  expressions,  the  value  Hi  is 
often  called  entropy  (although  it  would  be  more  exact  to  speak 
of  "negative  entropy").^  We  shall  use  the  words  "quantity,  or 
amount,  of  information"  and  "entropy"  as  synonyms. 


*  The  fact  that  the  logarithm  is  taken  with  base  2  is  connected  with  the  choice 
of  the  unit  of  quantity  of  information. 

"  [Hi  is  infrequently  called  negentropy  in  English.— Tr.] 
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5.  The  Amount  of  Information  in  a  Sequence  of  Dependent 
Symbols 

Up  to  now,  we  have  considered  only  the  special  case  of  mes- 
sages consisting  of  sequences  of  independent  symbols.  How- 
ever, messages  in  which  symbols  are  dependent  are  much  more 
frequently  encountered.  An  example  of  such  a  message  is  a  se- 
quence of  letters  in  a  text,  or  a  sequence  of  dots,  dashes,  and 
spaces,  as  in  Morse  code,  etc. 

Let  us  first  take  the  case  of  a  message  in  which  statistical  de- 
pendencies exist  only  between  two  adjacent  symbols  (i.e.,  a 
message  involving  only  a  simple  Markov  process).  Let  fli^  a2^ 
.  .  .  ,  fl„  be  the  symbols  of  the  alphabet;  we  choose  any  particu- 
lar symbol,  which  we  relabel  a^.  The  indeterminacy  character- 
izing the  situation  before  the  occurrence  of  the  next  symbol 
can  be  calculated  using  formula  (1),  except  that  in  place  of  the 
probabilities  p(i),  we  must  insert  conditional  probabilities  for 
the  symbols,  to  wit:  the  probabilities  of  the  symbols  when  the 
preceding  symbol  was  ai,  namely,  pa^  (i)-  The  value  Ha.^^  (the 
amount  of  information  in  a  symbol  for  the  fixed  condition  di) 
is  tlien  calculated  as  follows: 

n 
^«1    =     -    2  P«l(«')  •10g2  Pa,{i).  (4) 

Clearly,  the  value  Ha^  for  the  same  message  will  vary  with  dif- 
ferent selections  of  a^.  Thus,  if  we  take  a  written  text  in  Eng- 
lish as  our  sample  of  a  message  and  consider  the  symbol  fli  to  be 
the  letter  q,  then  the  value  Ha^  will  be  nearly  zero,  since  the 
situation  "after  q"  is  hardly  indeterminate  at  all;  one  can  pre- 
dict with  assurance  that  the  following  letter  will  be  u  {q  is  al- 
most always  followed  by  a  ii).  Elsewhere,  the  degree  of  inde- 
terminacy is  considerably  higher.  The  average  indeterminacy 
arising  for  all  situations  in  which  one  symbol  in  the  message  is 
already  known  is  called  conditional  information,  or  conditional 
entropy,  and  is  calculated  using  the  following  formula: 

Hi  =  p{ai)-IIai  +  p(a2)-Ha^  +  •  •  •  +  pian)-Ha„,  (5) 

where  H2  is  the  conditional  entropy,  Ha.^,  Ha^,  .  .  .  ,  Ha^  are 
entropies  for  fixed  conditions,  while  p(ai),  p{a^,  .  .  .  ,  picin)  are 
the  probabilities  of  the  symbols  a^,  ao,  .  .  .  ,  an,  respectively. 
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In  information  theory,  the  basic  characteristic  of  a  message 
is  usually  considered  to  be  only  the  value  H2;  however,  in  lin- 
guistic applications  the  value  Haj^  is  also  used,  as  we  shall  see 
in  Sec.  7.4. 

Sometimes  it  is  simpler  to  calculate  the  value  of  the  condi- 
tional entropy  in  another  fashion;  if  one  knows  the  probabili- 
ties p(jj,  i)  for  all  pairs  of  symbols  in  the  message,  then  one  can 
calculate  the  value  of  H^— the  quantity  of  information  in  a  pair 
of  symbols— using  formula  (1): 

n        n 

H^  =  -HH  VU,  i)  log2  vij,  i)- 

Afterwards,  the  value  of  H^  can  be  defined: 

H,  =  H'-  H,,  (6) 

where  Hi  is  the  amount  of  information  in  the  same  message,  as 
calculated  from  formula  (1).  It  follows  from  formula  (6)  that 
the  conditional  entropy  is  equal  to  the  difference  between  the 
information  in  a  pair  of  symbols  (the  values  of  H^)  and  the  in- 
formation in  one  symbol,  as  calculated  on  the  assumption  that 
the  symbols  are  independent  (the  value  of  H-^. 

One  can  calculate  the  conditional  entropy  in  messages  anal- 
ogously, where  statistical  dependencies  are  distributed  not  just 
among  two  symbols  but  over  three  or  more.  For  a  sequence  of 
symbols,  where  statistical  dependencies  exist  only  between  two 
adjacent  symbols,  the  fundamental  characteristic  is  the  value  of 
H2— conditional  entropy  of  the  first  order;  if  dependencies  exist 
among  as  many  as  three  symbols,  then  one  must  define  the  value 
of  /f 3— conditional  entropy  of  the  second  order— for  complete 
characterization  of  the  sequence;  in  messages  where  depend- 
ency extends  over  m  symbols,  a  value  Hm,  i.e.,  a  conditional 
entropy  of  the  (m  —  l)th  order,  must  be  defined. 

Let  dependency  be  distributed  among  m  symbols.  We  shall 
designate  a  certain  ith.  combination  of  m  letters  by  Bi;  then,  the 
probability  of  the  combination  Bi  will  be  designated  p{Bi). 
The  amount  of  information  Hm  in  a  combination  of  m  depend- 
ent symbols  is  equal  to 

H-=  -J:p(B,)-log2p(Bi), 
where  n  is  the  number  of  different  combinations  of  m  symbols. 
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The  conditional  entropy  Hm  o£  the  (m  —  l)th  order  equals 
the  difference  between  the  quantity  of  information  in  the  com- 
bination of  m  symbols  and  that  in  a  combination  of  m  —  1 
symbols,  i.e., 

Hm  =  H^  -  H^-\  (7) 

In  general,  a  message  in  which  the  statistical  bonds  extend 
over  a  great  distance  can  be  characterized  by  the  value  Ho  (the 
amount  of  information  if  all  symbols  are  equally  probable),  by 
the  value  H^  (the  amount  of  information  if  all  symbols  are  in- 
dependent), by  the  value  H2  (the  amount  of  information  if  de- 
pendency extends  only  to  two  adjacent  symbols),  by  H^,  etc. 
The  values  Hq,  Hi,  H2,  etc.,  represent  sequential  approxima- 
tions to  the  value  //—the  real  amount  of  information  per  sym- 
bol. 

Thus,  the  amount  of  information  increases  with  an  increase 
in  the  number  of  different  symbols  in  the  message,  whereas  it 
decreases,  for  a  given  number  of  symbols,  with  the  presence  of 
statistical  limitations  in  the  message  (i.e.,  with  inequality  of 
probabilities  and  presence  of  dependencies  among  symbols). 
Sometimes  it  is  necessary  to  compare  different  messages  from 
the  standpoint  of  the  magnitude  of  the  statistical  limitations. 
For  this  purpose,  the  concepts  of  relative  entropy  //rei  and  of 
redundancy  R  (see  [38] )  are  introduced.  These  values  are  cal- 
culated as  follows: 

H 

Hrel    =   — '  (8) 

■"max 

where  H  is  the  real  quantity  of  information  in  a  symbol,  and 
//max  is  the  maximal  possible  amount  for  a  given  number  of  sym- 
bols (//max  =  Ho  =  logs  n,  where  n  is  the  number  of  symbols  in 
the  alphabet).  Redundancy  is  a  value  directly  connected  with 
the  relative  entropy: 

R  =  1-  Hrel.  (9) 

Relative  entropy  and  redundancy  are  expressed  in  per  cents. 
If  all  the  symbols  in  a  message  are  equally  probable  and  inde- 
pendent, redundancy  equals  zero;  the  greater  the  statistical 
limitations,  the  higher  the  redundancy.^" 


^"Additional  explication  of  the  concept  of  "redundancy"  can  be  given  after 
questions  regarding  encoding  have  been  considered;  see  Sec.  4.2. 
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4.  Some  Transformations  of  Messages 

4.1   Quantization 

Whenever  we  describe  the  process  of  transmission  of  infor- 
mation, we  proceed  from  the  assumption  that  a  message  is  dis- 
crete, i.e.,  that  it  is  a  sequence  of  individual  symbols  selected 
from  a  finite  alphabet.  In  addition,  there  exist  messages  seem- 
ingly possessing  a  basically  different  nature.  Take  the  telephone, 
for  example.  The  functional  principle  of  telephone  apparatus 
lies  in  the  fact  that  sound  waves  are  transformable  into  elec- 
tric signals  that  can  be  transmitted  over  wires.  The  amplitude 
of  the  sound  wave  can  vary  continuously;  it  takes  on  a  countless 
number  of  different  values,  since  the  values  of  the  amplitude 
during  consecutive  moments  of  time  can  differ  very  little  from 
each  other.  But  a  continuous  message  cannot,  in  practice,  be 
transmitted  in  an  unchanging  form;  in  the  first  place,  it  is  im- 
possible to  transmit  every  momentary  value  of  the  amplitude 
of  a  sound  wave— they  can  be  transmitted  only  at  definite  in- 
tervals; in  the  second  place,  it  is  impossible  to  achieve  an  abso- 
lutely exact  measurement  of  the  value  of  the  amplitude,  and 
the  value  is  noticeably  coarsened  during  transmission.  The  op- 
erations performed  in  all  instances  of  transmission  of  continu- 
ous messages  bear  the  name  of  quantization. 

Quantization  occurs  not  only  in  technical  communications 
systems;  analogous  processes  occur  during  transfer  of  informa- 
tion by  human  beings  (see  Sec.  5.3). 

The  essence  of  quantization  can  be  represented  more  pre- 
cisely by  the  following  example:  If  one  takes  a  discrete  scale 
such  that  an  initially  continuous  change  in  some  value  is  broken 
up  into  a  series  of  discrete  values,  one  can  then  transmit,  in  se- 
lected intervals  of  time,  not  the  actual  magnitude  of  the  value, 
but  rather  that  which  corresponds  to  the  nearest  level  specified 
on  the  scale.  Thus,  the  value  is  "approximated"  with  a  specified 
degree  of  accuracy,  and  transmission  of  the  communication  is 
once  again  reduced  to  the  transmission  of  numerical  values  cho- 
sen from  a  finite  alphabet. 
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4.2  Coding 

The  transmission  of  messages  over  great  distances  often  de- 
mands a  change  in  their  form;  e.g.,  a  telegram,  originally  writ- 
ten as  a  sequence  of  letters,  is  transformed  into  a  sequence  of 
dots  and  dashes  and  then  into  a  sequence  of  electric  impulses. 
Such  transformations  are  called  encoding.  There  are  various 
types  of  codes.  A  code  is  characterized  as  an  alphabet  of  ele- 
mentary symbols  with  rules  for  their  combination.  A  set  of 
symbols  in  one  code  directly  related  to  a  symbol  or  combina- 
tion of  symbols  in  another  code  is  called  a  code  combination. 
There  are  codes  in  which  all  the  combinations  are  the  same 
length.  Such  codes  are  called  regular  (Morse  code  is  not  regu- 
lar;  the  average  length  of  the  combination  of  symbols  for  e 

("  .")  is  one,  while  that  for  o  is  three:  " ").  Aside  from 

the  length  of  a  code  combination,  one  main  characteristic  of 
a  code  is  the  total  number  of  symbols  in  the  alphabet.  Quite 
frequently  a  code  is  used  the  alphabet  of  which  consists  of  only 
two  symbols.  Such  codes  are  called  binary. 

One  can  take  advantage  of  the  statistical  structure  of  a  mes- 
sage and  use  codification  to  transmit  the  latter  in  the  shortest 
possible  time.  Let  us  consider  the  following  example  (from 
Shannon  [37];  treated  in  greater  detail  in  [5]  and  [8] ). 

Suppose  we  have  a  message  in  an  alphabet  consisting  of  four 
letters  with  the  probabilities  1/2,  1/4,  1/8,  and  1/8,  respectively. 
If  one  encodes  this  message  with  a  regular  binary  code  in  which 
each  code  combination  is  two  symbols  long,  then  a  section  of 
the  message  containing  1,000  letters  will  consist  of  2,000  code 
symbols.  One  can  also  encode  this  message  with  another  code, 
an  irregular  one,  giving  the  first  letter  the  combination  0,  the 
second  10,  the  third  1 10,  and  the  fourth  111.^^  If,  in  a  communi- 
cation 1,000  letters  in  length,  the  first  letter  occurs  about  500 
times,  the  second  250,  and  the  third  and  fourth  125  times  each, 
then  the  total  length  of  the  message  when  encoded  in  the  sec- 


"  Since  our  code  is  irregular,  and  there  is  no  special  combination  for  spaces, 
the  construction  of  code  combinations  must  satisfy  the  following  requirement: 
No  combination  can  be  the  beginning  of  another,  or  else  it  will  be  impossible 
in  decoding  to  separate  unambiguously  the  parts  of  a  message  that  correspond 
to  code  combinations.  For  this  reason,  one  cannot  use  the  combinations  1,  11, 
01,  etc.,  in  this  code. 
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ond  way  will  be  about  1,750  symbols  (500-1  +  250-2  +  125-3 
•2),  i.e.,  250  symbols  less  than  if  it  were  encoded  in  the  first 
way.  Consequently,  the  second  code  makes  it  possible  to  trans- 
mit a  message  in  less  time. 

Thus  arises  the  problem  of  optimal  codes,  i.e.,  codes  that 
would  guarantee  the  possibility  of  transmitting  a  message  with 
a  given  quantity  of  information  using  the  shortest  possible  se- 
quence of  symbols.  The  basic  principles  for  constructing  opti- 
mal codes  are  reducible  to  the  following.  If  a  message  consists 
of  independent  symbols  not  having  equal  probabilities,  then 
the  optimal  code  must  assign  to  the  most  frequent  symbol  the 
shortest  code  combination,  and  conversely:  The  least  frequent 
must  be  assigned  the  longest  combination  (as  in  our  example). 
If  the  symbols  in  a  message  are  not  independent,  then  it  would 
be  wise  to  substitute  code  combinations,  not  for  individual  sym- 
bols, but  for  groups  of  symbols  in  the  output  message  (since,  ow- 
ing to  the  dependency  between  symbols,  various  groups  will  be 
encountered  with  varying  frequencies,  and  some  might  not  even 
occur  at  all). 

An  important  result  of  information  theory  is  that  it  permits 
one  to  determine  how  far  he  can  go  in  shortening  a  message 
through  the  use  of  intelligent  encoding.  One  of  Shannon's  the- 
orems says  that  the  limit  to  which  this  reduction  of  the  length 
of  a  message  can  be  carried  with  binary  encoding  is  defined  by 
the  quantity  of  information  in  the  message.  Thus,  in  our  ex- 
ample, the  amount  of  information  per  symbol  is 

Hi=  -  (1/2  log2 1/2  +  1/4  log2 1/4  +  1/8  log2 1/8  +  1/8  loga  1/8) 

=  1.75  u. 

In  a  message  1,000  letters  long,  the  quantity  of  information  is 
1,750  Uj  and  the  second  code  for  this  message  is  optimal. ^^ 

Coding  is  used  not  only  for  greater  efficiency  but  also  to  im- 
prove the  transmission  of  information.  For  the  latter  purpose, 
self-correcting  codes  are  employed.  The  principle  on  which 
these  codes  are  constructed  is  that  not  all  possible  combinations 
of  symbols  are  used  as  code  combinations,  but  only  a  part  (i.e., 

"  We  can  now  make  the  concept  of  redundancy  more  concrete.  The  redun- 
dancy in  a  message  amounts  to  50  per  cent  if,  in  translating  it  from  a  certain 
code  into  the  optimal  code  with  the  same  number  of  symbols,  we  find  that  its 
length  has  decreased  by  50  per  cent. 
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code  combinations  are  constructed  according  to  specific  rules). 
A  distortion  of  one  of  the  symbols  changes  a  code  combination 
into  a  set  of  symbols  that  is  not  a  legitimate  code  combination. 
If  the  distinction  among  combinations  is  great  enough,  one  can 
not  only  find  out  that  an  error  has  occurred  but  also  predict 
rather  accurately  which  actual  combination  should  take  the 
place  of  the  distorted  one.  The  construction  of  combinations 
according  to  specific  rules  signifies  a  conscious  introduction  of 
redundancy  into  the  code. 

The  very  immediate  connection  between  the  quantity  of  in- 
formation in  a  message  and  an  economical  encoding  of  it  is  a 
proof  of  the  thesis  formulated  in  the  beginning  of  this  chapter, 
namely,  that  the  statistical  structure  of  a  message,  when  evalu- 
ated with  the  aid  of  the  concepts  of  information  theory,  is  the 
only  property  of  communication  that  can  be  considered  essen- 
tial from  the  standpoint  of  transmitting  it  over  communications 
systems. 

The  value  of  the  concept  of  information  developed  by  Shan- 
non is  not  limited  to  its  broad  possibilities  for  application  in 
communications  technology.  It  proves  fruitful  in  many  other 
areas  as  well.  Let  us  cite  several  examples.  The  concept  of  in- 
formation is  very  important  in  the  study  of  several  processes  of 
human  transmission  and  reception  of  information.  In  one  par- 
ticular experiment  [32],  the  degree  of  accuracy  in  understand- 
ing words  under  extremely  noisy  conditions  was  measured.  Dif- 
ficulty in  recognizing  words  (and  the  information  in  a  word) 
was  found  to  depend  on  knowledge  of  the  number  of  words 
from  which  a  particular  word  was  chosen.^^  If  the  hearer  knows, 
for  example,  that  the  word  to  be  spoken  is  one  of  ten  figures 
(from  zero  to  nine),  then  the  word  will  be  understood  cor- 
rectly even  under  extremely  noisy  conditions.  But  if  he  does 
not  know  this,  he  will  not  comprehend  the  word  even  50  per 
cent  of  the  time.  In  this  way,  difficulty  in  comprehension  de- 
pends on  the  amount  of  information  contained  in  the  word. 

The  broad  applicability  of  Shannon's  concept  of  information 
is  also  illustrated  in  experiments  on  the  rate  of  performance  of 
several  processes  connected  with  the  transmission  of  informa- 
tion in  the  human  organism  [10].  The  experiment  is  as  fol- 


[Other  pertinent  factors  being  held  constant.— Tr.] 
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lows.  The  subject  must  react  in  response  to  n  different  condi- 
tions; e.g.,  in  response  to  n  different  signals  he  must  push  one 
of  n  knobs.  It  turns  out  that  the  time  needed  to  complete  such 
a  reaction  increases  as  a  logarithmic  function  of  the  number  of 
signals.  This  phenomenon  can  be  explained  only  if  we  accept 
the  idea  that  "messages"  in  the  human  nervous  system  are  opti- 
mally encoded,  and  therefore  the  length  of  a  message  (and, 
consequently,  the  time  needed  to  transmit  it)  is  proportional 
to  the  amount  of  information  contained  in  it. 

We  shall  now  consider  ways  of  applying  the  concepts  intro- 
duced above  to  linguistic  research. 


5.  Language  as  a  Code  with  Probabilistic  Limitations 

The  study  of  language  using  the  methods  of  information  theory 
requires  the  construction  of  a  definite  model  for  language.  The 
model  gives  a  simplified  picture  of  language  because  it  reflects 
only  certain  of  its  properties,  ignoring  the  others.  The  extrac- 
tion of  certain  properties  with  the  use  of  the  model  has  great 
significance,  since  this  permits  the  study  of  certain  properties 
by  exact  methods. 

The  model  on  which  we  shall  base  our  discussion  is  that  of 
language  defined  as  a  code  with  probabilistic  limitations.  Two 
aspects  must  be  elucidated:  (1)  In  what  respects  can  language 
be  said  to  approximate  a  code?  (2)  Why  do  the  probabilistic 
limitations  in  language  deserve  the  attention  of  linguists? 

5.1  Language  and  Code 

On  pp.  133  ff.  we  discussed  various  types  of  codes;  there  we 
were  referring  to  artificial,  specially  developed  notations  for 
messages  which,  until  they  are  encoded  in  a  particular  manner, 
exist  in  another,  "natural"  form— e.g.,  as  a  sequence  of  words  or 
letters.  However,  the  word  "code"  can  be  understood  in  an- 
other, broader  sense— as  any  method  of  writing  a  message.  Ac- 
cording to  G.  A.  Miller  (  [30],  p.  7),  "any  system  of  symbols 
that,  by  prior  agreement  between  the  source  and  destination,  is 
used  to  represent  and  convey  information  will  be  called  a  code." 
In  this  broad  sense,  language,  too,  can  be  called  a  code. 
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The  identification  of  language  with  code  is  based  primarily 
on  the  fact  that,  in  language  as  in  technical  codes,  description 
of  the  combinability  of  elements  plays  an  important  role.  For 
a  technical  code,  the  rule  for  combination  of  symbols  is  the 
main  characteristic.  In  language,  aside  from  the  combinatory 
capabilities  of  the  units,  many  other  properties  are  essential; 
but  the  importance  of  describing  language  from  the  standpoint 
of  the  combinatory  capabilities  of  its  units  cannot  be  doubted. 
The  limited  nature  of  this  capability  is  one  factor  defining  the 
structure  of  language:  The  combinability  of  phonemes  is  no  less 
important  a  characteristic  of  a  phonological  system  than  the 
composition  of  the  phonemes  and  their  grouping  from  the 
standpoint  of  phonological  oppositions;  description  of  word 
formation  and  word  alternation  is  the  determination  of  the  laws 
of  combinability  of  morphemes  within  a  word;  analysis  of  the 
syntactic  structure  of  language  absolutely  demands  a  descrip- 
tion of  the  combinatory  capabilities  of  syntactic  word  classes 
within  a  phrase  or  clause.  A  description  of  combinability  can 
be  constructed  analogously  for  meaningless  units  such  as  the 
phoneme  or  syllable,  and  for  meaningful  ones,  i.e.,  for  words, 
morphemes,  etc.  From  the  standpoint  of  the  combinatory  capa- 
bilities of  language  units,  we  shall  equate  the  description  of 
language  with  the  description  of  code.^^ 

The  identification  of  language  with  code  sometimes  occurs 
for  another  reason.  In  fact,  for  all  technical  codes  except  repeti- 
tions of  code  combinations,  an  indication  of  the  coding  rules  is 
necessary.  These  rules  define  a  method  of  comparing  a  combi- 
nation in  a  particular  code  with  a  symbol  or  group  of  symbols 
in  another  code.  Usually,  a  message  undergoes  several  trans- 
formations on  the  way  from  the  source  to  the  receiver,  i.e., 
translations  from  one  code  to  another.  The  code  applied  in  the 
part  of  the  communications  system  nearest  to  the  source  can  be 
called  primary,  relative  to  that  used  further  on,  i.e.,  the  sec- 


"  The  importance  of  describing  the  codelike  aspects  of  language  (in  our  sense) 
was  first  emphasized  in  the  distributional  theory  of  language  (see  the  expansion 
of  this  theory  developed  by  Z.  S.  Harris  in  his  Methods  in  Structural  Linguis- 
tics, The  University  of  Chicago  Press,  Chicago,  Illinois,  1951).  His  theory,  how- 
ever, proceeds  from  the  assumption  that  description  of  the  codelike  properties 
of  language  is  restricted  at  present  by  the  limited  possibilities  of  exact  descrip- 
tion of  language,  which  is  absolutely  untrue. 
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ondary  code  (for  the  distinction  between  primary  and  second- 
ary codes,  see  [2] ).  The  interrelation  of  language  with  codes, 
that  serve  as  primary  codes  with  respect  to  language,  is  still  a 
very  complicated  and  unclear  problem.  Therefore,  in  speaking 
of  the  codelike  properties  of  language,  we  shall  imply  not  the 
monovalence  of  the  coding  rules  but  only  the  existence  of  defi- 
nite laws  for  the  combinatory  characteristics  of  the  units. 


5.2.  Determination  of  a  Code  by  Analysis  of  Messages 

In  technical  codes,  the  man  designing  the  code  determines 
the  rules  for  combining  units.  In  language,  the  rules  are  not 
immediately  given.  There  exists  only  the  totality  of  messages 
encoded  in  this  code  (text),  and  the  properties  of  the  code 
must  be  determined  by  analysis  of  messages.  However,  deter- 
mination of  the  rules  of  a  code  from  messages  is  complicated 
by  real  difficulties.  We  shall  discuss  some  of  the  more  basic  of 
these  in  detail. 

The  rules  governing  the  combinability  of  units  of  a  message 
differ  qualitatively.  For  example,  let  us  represent  messages  as 
sequences  of  words.  In  real  texts,  limitations,  especially  those 
due  to  the  language's  grammar,  are  imposed  on  the  combina- 
tory capacities  of  words;  the  presence  of  such  limitations  causes, 
for  example,  a  near-zero  probability  of  finding  the  following 
sequence  of  words  in  any  (grammatical)  Russian  utterance:  iz 
trudami  matematicheskij  pa  lingvistika.^^  Not  only  grammatical 
but  also  semantic  limitations  are  imposed  on  real  text.  As  a  re- 
sult of  the  existence  of  such  limitations,  one  would  hardly  en- 
counter in  real  text  such  a  statement  as,  for  example,  the  fol- 
lowing: Kvadrat  vypil  vsyu  trapetsiyu  [The  square  drank  up 
the  entire  trapezoid].  Grammatical  limitations  are  inherently 
different  from  restrictions  on  meanings.  If  it  seems  natural  to 
consider  grammatical  limitations  on  text  as  caused  by  the  code 
(i.e.,  as  codelike),  then,  obviously,  restrictions  on  meaning  have 
no  relation  to  the  code  but,  rather,  are  characteristics  of  the 
communication  itself.  In  fact,  all  real  messages  originate  from 
some  concrete  situation  or  other.  The  sentence  about  the  square 


"  [No  interpretation  of  this  sequence  could  make  it  satisfy  Russian  rules  of 
grammatical  agreement,  word  order,  etc.;  it  is  "ungrammatical."— Tr.] 
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has  not  been  encountered  because  the  corresponding  situation 
has  not  occurred;  but  if  it  had  occurred,  such  a  sentence  would 
have  been  created,  and  it  would  not  have  contradicted  the 
norms  of  the  Russian  language,  unlike  the  utterance  before  it, 
which  was  introduced  as  an  illustration  of  the  breaking  of  code 
rules.  Consequently,  semantic  limitations  of  the  text  are  natu- 
rally considered  to  be  noncodal.  (Regarding  this,  see  [20]. ) 

The  formulation  of  laws  for  the  combinability  of  language 
units  in  the  form  of  determinate  rules  demands  that  essentially 
linguistic,  codelike  restrictions  be  separated  from  those  charac- 
terizing the  message  itself  and  having  no  relation  to  the  code. 
However,  the  criteria  that  would  permit  the  separation  of  the 
different  types  of  restrictions  are  not  clearly  enough  defined. 
Thus,  the  probability  is  very  high  that  in  all  Russian  texts  cre- 
ated up  to  now  we  shall  not  find  such  combinations  as:  vchera 
pojdu  [I  shall  go  yesterday]  (although  skoro  pojdu  [I  shall  go 
presently]  does  exist),  inogda  ubezhal  [He  had  sometimes  run 
away]^*^  (compare  bystro  ubezhal  [He  ran  away  fast]),  dere- 
vyannoe  ispol'zovanie  [wooden  use]  (compare  polnoe  ispol'zo- 
vanie  [complete  use]),  etc.  Is  the  absence  of  such  combinations 
in  text  caused  by  the  code  of  language  or  by  "meaning"?  Obvi- 
ously, different  descriptions  will  provide  different  answers  to 
this  question. 

The  second  class  of  complicated  problems  is  related  to  the 
following  circumstance.  Language  communications  can  be  stud- 
ied on  several  "levels"^^— on  those  of  phonemes,  syllables,  mor- 
phemes, words,  letters,  etc.;  the  combination  of  units  on  one 
level  is  affected  by  limitations  reflecting  laws  of  combinability 
of  units  not  only  on  that  level,  but  on  others  as  well.  Thus,  the 
phoneme  combination  [szh]  is  unallowable  by  the  phonologi- 
cal code  of  Russian;  the  combination  [zhasas]  is  permissible  by 
that  code  but  is  not  found  in  messages,  because  it  does  not  cor- 
respond to  any  meaningful  unit  of  the  language;  the  combina- 
tion [sazh— am]  is  not  allowed  for  by  morphological  laws,  i.e., 
by  the  laws  governing  the  combinability  of  phonemes  in  words. 


'^^  [The  verbal  aspect  indicates  a  unique  action,  as  "He  had  sometimes  run 
away  once."—Tx.] 

"  By  "level,"  we  mean  the  "aspect"  or  "standpoint"  from  which  we  consider  a 
language  communication.  A  level  is  characterized  by  the  type  of  discrete  units 
of  which  we  consider  a  communication  to  be  composed. 
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while  the  combination  [k  sdzhu]  is  excluded  by  the  laws  of  syn- 
tax. In  this  way,  phoneme  chains  are  restricted,  as  are  all  other 
levels  of  languages. 

If  we  make  it  our  problem  to  describe  the  laws  governing 
combinability  in  the  form  of  structural,  qualitative,  determi- 
nate rules,  then  we  must  separate  the  limitations  affecting  the 
laws  of  one  level  from  all  others.  But  the  criteria  of  separation, 
as  in  the  preceding  instance,  can  hardly  be  formulated  clearly 
enough. 

Finally,  one  more  type  of  difficulty  lies  in  the  fact  that  mes- 
sages can  break  the  laws  of  a  particular  code.  We  have  stated 
above  that  the  word  combination  iz  trudami  matematicheskij 
po  lingvistika  has  an  extremely  low  probability  of  being  en- 
countered in  real  messages,  but  it  is  not  excluded  entirely.  Ac- 
tually, such  "utterances"  can  occur  in  real  messages,  since  they 
can  be  made  by  someone  not  too  familiar  with  the  code  of  a 
particular  language.  Such  distorted  phrases  occur  not  infre- 
quently in  artistic  literature,  as,  for  example,  the  Chinese  girl 
Sin  Bin's  remark  in  Vs.  Ivanov's  tale,  Bronepoezd  14-69: 
"Maya  Kitaya  v  poryadok  nadof"  ["Mine  of  China  in  order 
must?"],  or  "Tvoya  moya  ne  ponimaj!"  ["Thine  mine  do  not 
understand!"]. 

We  can  cite  yet  another  example  of  broken  code  rules  in 
communications— an  example  relating  to  the  area  of  phonology. 
In  Russian,  the  combination  of  [n]  with  a  consonant  at  the  be- 
ginning of  a  word  is  not  ordinarily  possible  (compare  the  re- 
moval of  this  combination  in  such  abbreviations  as  NSO,  NTO, 
etc.,  which  are  pronounced  with  an  initial  [e];  then  compare 
SNO,  MKhAT,  etc.,  pronounced  without  the  [e],  since  unal- 
lowable combinations  of  consonants  at  the  beginning  of  a  word 
are  not  present  in  these  cases).  However,  the  combination  of 
[n]  with  a  consonant  is  possible  in  "exotic"  words  (e.g.,  ngana- 
sane,  nkundo). 

5.3.  Statistical  Approach  to  the  Combinatory 
Properties  of  Language  Units 

Because  of  the  complexity,  multiplicity,  and  close  overlapping 
of  the  various  laws  that  determine  the  structure  of  a  language 
text,  the  laws  of  language  code  cannot  be  described  in  the  same 
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way  as  the  rules  for  technical  codes  (e.g.,  as  a  simple  iteration 
of  combinations,  like  Morse  code,  or  as  an  indicator  of  certain 
restrictions,  like  self-correcting  codes).  Since  laws  based  on  a 
qualitative  analysis  of  the  combinatorial  capacities  of  units  on 
a  certain  level  are  always  limited  and  incomplete,  it  is  useful, 
in  addition,  to  formulate  laws  for  combinability  in  the  form  of 
statistical  laws.  In  this  case,  the  probabilistic  process  may  be 
the  text-producing  model;  in  particular,  one  can  consider  a 
Markov  process  to  be  such  a  model  (either  simple  or  complex). 
If  we  are  to  construct  a  model  for  text  in  the  form  of  a  simple 
Markov  process,  we  must  define  the  probabilities  of  all  sym- 
bols for  this  text,  as  well  as  the  conditional  probabilities  of  the 
first  order.  For  example,  if  one  represents  Russian  text  as  a  se- 
quence of  vowels  and  consonants— phonemes— then  a  Markov 
model  for  this  text  can  be  given  as  in  Table  1,  where  C  stands 
for  consonant,  V  for  vowel.^^  The  probabilities  of  the  corre- 
sponding transitions  are  shown  in  the  table. 

TABLE  1 

p(C)  =0.58;        p(V)  =  0A2 


r:\ 

i 

PiV) 

C 

V 

C 

0.26 

0.74 

J 

V 

0.99 

0.01 

The  simple  Markov  process  still  gives  a  poor  approximation 
of  real  text,  since  the  statistical  bonds  in  language  text  act  at 
great  distances;  thus,  the  probability  of  encountering  a  certain 
word  at  the  end  of  some  article  or  even  a  thick  book  depends, 
in  essence,  on  the  kind  of  words  composing  its  title  [8].  There- 


"  For  data  on  conditional  probabilities  of  Russian  vowels  and  consonants,  see 
L.  R.  Zinder,  "O  lingvisticheskoj  veroyatnosti"  ["On  Linguistic  Probability"], 
Voprosy  yazykoznaniya.  No.  2,  1958,  pp.  121-125.  (Translated:  JPRS  6445,  U.S. 
Joint  Publications  Research  Service,  Series  "K":  Soviet  Developments  in  Infor- 
mation Processing  and  Machine  Translation,  December,  1960,  pp.  7-14.) 
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fore,  i£  we  construct  a  statistical  model  of  a  given  text  to  calcu- 
late the  dependency  of  one  symbol  only  on  the  preceding  one, 
then,  of  course,  this  model  will  not  reflect  the  long-range  con- 
nections. However,  for  the  same  text  one  can  construct  a  se- 
quence of  statistical  models  that  will  reflect  the  statistical  regu- 
larities of  the  text  much  more  accurately  (i.e.,  reflect  the  de- 
pendencies throughout  a  large  section  of  text),  since  the  chains 
of  symbols  created  by  the  model  will  in  every  case  "resemble" 
the  text  more  closely.  We  shall  cite  several  examples  of  the 
"product"  of  statistical  models  for  Russian  and  English  (taken 
from  [11]  and  [40]). 

/.  Examples  for  Russian  (The  message  is  studied  at  the  letter 
level.) 

1.  A  zero-order  approximation  ("text"  consists  of  the  same 
alphabetic  symbols  as  real  text,  but  the  statistical  regularities 
of  real  text  are  not  reflected  at  all;  the  symbols  are  randomly 
distributed): 

sukherrob' dsh  yaykhvshchikhjtoifvna  rfenvshtdfrpkhshchgpch'- 

kizryas. 

2.  An  approximation  of  the  first  order  (the  letters  occur  with 
real-text  frequencies) : 

syntsyya'  oerv  odng  'uemklojk  zbya  entvsha. 

3.  An  approximation  of  the  second  order  (the  simple  Markov 
process;  combinations  of  two  letters  occur  with  real-text  fre- 
quencies, i.e.,  conditional  probabilities  of  the  first  order  are 
taken  into  account): 

umarono  kach  vsvannyj  rosya  nykh  kovkrov  nedare. 

4.  An  approximation  of  the  third  order  (each  three-letter 
combination  occurs  with  the  same  frequency  as  in  ordinary 
text): 

pokak  pot  postivlennyj  durnoskaka  nakonenno  zne 
stvolovil  se  tvoj  obnil'. 

5.  An  approximation  of  the  fourth  order  (four-letter  combi- 
nations occur  with  the  frequencies  found  in  ordinary  text) : 

vesel  vrat'sya  ne  sukhom  i  nepo  i  korko. 
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II.  Examples  for  English  (The  message  is  considered  on  the 
word  level.) 

1.  An  approximation  of  the  first  order  (the  words  occur  with 
real-text  frequency) : 

representing  and  speedily  is  an  good  art  or  come  can  different 
\        natural  here  he  the  a  in  come  the  to  of  to  expert  gray. 

2.  An  approximation  of  the  second  order  (two-word  combi- 
nations occur  with  real-text  frequency  in  English) : 

the  head  and  in  the  frontal  attack  on  an  English  writer  that 
the  character  of  this  point  is  therefore  another  method  for 
the  letters  that  the  time  of  who  ever  told  the  problem  for 
an  unexpected.     . 

The  statistical  approach  proposed  for  describing  combinatory 
capabilities  possesses  several  internal  restrictions.  In  such  an  ap- 
proach, the  combinability  of  language  units  cannot  be  described 
individually  and  in  detail.  The  statistical  approach  does  not  al- 
low for  a  reflection  of  the  qualitative  diversity  of  relations 
among  elements  (e.g.,  the  multiplicity  of  the  grammatical  rela- 
tions among  words),  or  for  the  separation  of  qualitatively  dis- 
tinct regularities.  The  "syntactic  rules"  of  language  in  such  a 
model  provide  only  for  the  conditional  probability  that  one  ele- 
ment will  follow  another.  However,  the  formulation  of  laws  for 
a  language  code,  in  the  form  of  statistical  rules  of  the  indicated 
sort,  possesses  a  certain  basic  worth:  The  statistical  laws  of  com- 
binability can  be  made  explicit  from  immediate  observation  of 
text;  in  order  to  formulate  such  rules,  one  does  not  have  to  de- 
velop criteria  for  the  idealization  of  a  text  that  absolutely  must 
be  introduced  in  order  to  divide  limitations  among  various  lev- 
els and  to  separate  code  restrictions  from  noncode  ones. 

Thus,  a  premise  for  the  study  of  language,  using  the  methods 
of  information  theory,  is  to  approach  language  as  a  code  and  to 
formulate  laws  for  this  code  in  the  form  of  statistical  rules  of 
a  definite  type.  This  is  exactly  what  we  have  in  mind  when  we 
say  that  language  study  through  the  methods  of  information 
theory  is  based  on  a  model  of  language  as  a  code  with  probabi- 
listic restrictions. 


144    Information  Theory  and  the  Study  of  Language 

5.4.  The  Discrete  and  Continuous  Quality  of  a 
Speech  Signal 

Our  model  for  language  is  based  on  the  assumption  that 
speech  is  a  sequence  of  discrete  units.  Linguistics  also  proceeds 
from  this  assumption;  one  of  its  main  problems  is  the  construc- 
tion of  alphabets  of  discrete  language  units.  But  this,  especially 
for  spoken  utterances,  is  complicated  by  basic  difficulties.  The 
carrier  of  a  spoken  message  is  a  sound  wave;  described  acousti- 
cally, from  the  standpoint  of  oscillatory  frequency,  time,  and 
intensity,  it  is  continuous.  This  continuity  yields  to  discreteness 
only  when  a  man  receives  the  signals;  a  quantization  of  the 
signal  occurs,  transforming  the  signal  into  a  sequence  of  discrete 
units.  Such  a  discrete  unit,  for  the  phonetic  aspect  of  language, 
is  usually  called  a  phoneme.  However,  the  principles  according 
to  which  speakers  induce  this  quantization  still  remain  unex- 
plained; at  any  rate,  this  process  cannot  be  "produced"  by  a 
machine. 

The  latest  achievements  in  the  technique  of  sound  analysis 
(particularly  the  invention  of  the  spectrograph)  have  given 
birth  to  a  number  of  attempts  to  separate  out  the  acoustical  in- 
variant for  a  large  class  of  physically  distinct  sounds  accepted  by 
speakers  as  one  phoneme.  If  such  an  invariant  had  been  found, 
we  would  have  arrived  at  the  point  at  which  a  "rounding"  of 
the  acoustical  characteristics  of  sound  must  occur  during  its  re- 
ception. However,  these  attempts  have  failed. 

The  work  of  R.  O.  Jakobson,  R.  M.  Fano,  and  M.  Halle^^  has 
been  most  valuable  for  explanation  of  the  principles  of  quanti- 
zation, for  they  have  shown  that  acoustical  correlates  can  be  de- 
fined not  for  phonemes,  but  for  the  distinctive  features  of  pho- 
nemes. A  distinctive  feature  is  not  an  absolute,  but  a  relative, 
characteristic  of  sound;  it  is  a  constancy  of  distinction  between 
a  certain  sound  and  other  sounds  pronounced  by  the  speaker 
in  analogous  circumstances.  Thus,  the  "rounding"  is  evidently 
oriented  toward  the  distinctive  feature  rather  than  toward  the 
phoneme. 

"  See  R.  O.  Jakobson,  R.  M.  Fano,  and  M.  Halle,  Preliminaries  to  Speech 
Analysis,  Technical  Report  No.  13,  Massachusetts  Institute  of  Technology,  Cam- 
bridge, Massachusetts,  1952. 
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Moreover,  it  has  been  shown  that  all  features  can  be  consid- 
ered as  binary,  i.e.,  as  taking  two  values.  For  example,  the  fea- 
ture of  "consonant  versus  vowel"  has  the  values  "vowel"  and 
"consonant,"  and  that  of  "voicedness"  has  the  values  "voiced" 
and  "unvoiced,"  etc.;  thus,  the  process  of  transmission  and  re- 
ception of  spoken  messages  is  a  process  of  encoding  and  decod- 
ing the  message  in  a  binary  code.  The  authors  introduce  a  list 
of  twenty  features  on  which  the  phonological  distinctions  made 
in  most  of  the  world's  languages  are  based  (each  language  "se- 
lects" as  its  distinctive  features  a  particular  group  from  among 
these  twenty). 

However,  the  number  of  unsolved  problems  in  this  area  is 
very  large,  and  there  exists  no  reliable  and  generally  acceptable 
list  of  phonemes  and  distinctive  features  for  any  language. 

Linguistics  asserts  that  language  communication  can  be  re- 
garded as  a  sequence  of  discrete  units  of  another  kind:  mean- 
ingful units— morphemes  or  words.  However,  the  construction 
of  an  alphabet  of  semantic  units  connected  with  the  analysis  of 
meaning  is  an  even  more  difficult  problem  than  that  of  compos- 
ing an  alphabet  of  units  for  the  phonetic  aspect  of  language 
(see  Chapter  II). 

A  message  written  in  a  natural  language  has  a  clearly  ex- 
pressed discrete  character.  The  alphabet  of  letters  for  a  written 
message  is  known  to  every  literate  person,  and  the  entire  word 
list  can  be  formulated,  at  least  for  limited  application  of  the 
language,  on  the  basis  of  appropriate  texts  (if  one  always  con- 
siders the  bounds  of  a  word  to  be  spaces  and  different  words  to 
be  different  sequences  of  letters  between  spaces,  then  this  pro- 
cedure does  not  present  any  basic  difficulties). 

Most  of  the  more  accurately  conducted  experiments  on  lan- 
guage using  information  theory  have  been  carried  out  by  indi- 
viduals generally  unfamiliar  with  data  (admittedly  inconclu- 
sive) about  the  alphabets  of  discrete  units  of  language  "proper" 
[i.e.,  spoken  language— Tr.]  that  linguists  have  presented.  For 
this  reason,  one  cannot  be  surprised  that  most  research  deserv- 
ing of  attention  relates  to  "written  language." 

Contemporary  linguistics  does  not  recognize  written  language 
as  a  fully  justified  object  of  research,  since  it  considers  written 
language  a  secondary  (and,  consequently,  artificial)  phenome- 


146    Information  Theory  and  the  Study  of  Language 

non.  However  accurate  this  attitude  may  be,  written  language 
cannot  be  disregarded. 

In  contemporary  society,  written  speech— the  bearer  of  in- 
formation to  be  transmitted  to  the  public— plays  an  enormous 
role,  and  the  practical  value  of  research  in  this  area  is  indis- 
putable. Data  on  written  language  are  therefore  definitely 
worthy  of  attention  for  their  own  sake.  Furthermore,  the  entire 
methodology  and  general  principles  of  linguistic  research,  using 
the  concepts  of  information  theory,  do  not  change  in  any  essen- 
tial respect  when  applied  to  spoken  language  as  well;  it  is  im- 
portant only  that  one  know  how  to  write  a  spoken  message  as 
a  sequence  of  discrete  units.  Thus,  the  fact  that  most  published 
data  refer  to  written  language  must  not  lead  to  the  conclusion 
that  the  value  of  the  results  is  restricted. 


5.5.  Statistical  Data  on  Language 

As  we  have  said,  a  statistical  model  of  language  is  proposed 
as  a  model  for  the  probabilities  and  conditional  probabilities 
for  all  the  symbols  of  the  alphabet.  The  relative  frequencies  of 
phonemes  and  letters  are  constant,  at  least  for  a  limited  area  of 
linguistic  usage  (i.e.,  for  texts  in  a  specific  area  of  science,  for 
the  speech  of  a  certain  language  community,  etc.).  Hence,  one 
can  speak  of  the  probabilities  of  these  symbols  for  a  given,  re- 
stricted "language,"  or  even  for  language  as  a  whole.  No  com- 
plicated procedure  is  needed  to  obtain  reliable  data  on  the 
probabilities  of  combinations  of  two  or  three  letters  or  sounds 
(for  statistical  data  on  phonemes  and  letters  in  Russian  and 
English,  see  [18],  [21],  [36]  and  the  bibliography  for  Chapter 
V,  [4]  and  [49];  for  data  on  the  frequencies  of  Russian  pho- 
nemes, see  the  table  on  p.  150). 

The  frequencies  of  words  vary  greatly  from  one  text  to  an- 
other. This  is  especially  true  of  rare  words,  because  in  the  ex- 
isting frequency  dictionaries  (see  bibliography  for  Chapter  V, 
[22],  [29],  [30],  and  [40]),  one  can  only  consider  the  conclusions 
for  the  first  thousand  most  frequent  words  (see  bibliography  for 
Chapter  V,  [10])  to  be  reliable.  There  exist  practically  no  data  on 
the  frequencies  of  phrases. 

Moreover,  the  connections  among  words  possess  such  a  strong 
"distant  action"  (see  p.  141)  that  any  Markov  model  based,  for 
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example,  on  conditional  probabilities  of  the  second  order 
yields  a  very  poor  approximation  to  the  sense-sequences  o£  words 
in  real  text  (for  examples,  see  above). 

Thus,  we  possess  syntactic  data  that  are  sufficient  only  for 
constructing  statistical  models  of  language  at  the  levels  of  let- 
ters and  phonemes.  We  do  not  consider  current  conclusions 
about  probabilities  or  the  composition  of  an  alphabet  to  be  suf- 
ficient for  constructing  statistical  models  of  language  at  the  level 
of  meaning  units. 


6.  Measurement  of  Information  and  Redundancy 
in  Natural  Languages 

6.1.  Calculation  of  the  Amount  of  Information  in 
Letters  and  Phonemes  on  the  Basis  of  their 
Probabilities 

In  order  to  calculate  the  amount  of  information  per  unit  of 
language  code,  using  formulas  (l)-(9)  (see  Sees.  2  and  3),  one 
must  know  the  probabilities  and  conditional  probabilities  of 
these  units.  Quantity  of  information  and  conditional  entropy 
have  been  calculated  for  several  European  languages  on  the  ba- 
sis of  available  statistical  data  ([4],  [15],  [18],  [38]).2o  The 
results  are  presented  in  Table  2.  For  written  utterances,  the 
quantity  of  information  per  letter  is  shown; ^^  for  spoken,  per 
phoneme.  Data  for  speech  are  only  present  for  Russian;  the 
dashes  in  the  table  indicate  the  absence  of  data. 

We  shall  consider  the  process  of  calculating  the  quantity  of 


^^  Many  of  the  numerical  data  cited  below  are  not  conclusive  and  need  fur- 
ther verification.  Even  in  such  cases,  however,  the  methodology  whereby  they 
were  obtained  is  independently  interesting,  and  they  therefore  deserve  our 
attention. 

-^  In  calculating  from  the  Russian  written  material,  it  was  agreed  that  the 
total  number  of  letters  is  32:  e  and  e,  h  and  rb  are  considered  to  be  the  same 
letter;  space  is  counted  as  a  letter.  In  English,  the  space  is  also  considered 
a  letter;  the  counts  for  German,  French,  and  Spanish  are  based  on  an  alphabet 
of  26  letters  (space  is  not  counted  as  a  letter,  and  the  above-the-line  symbols— 
i.e.,  accents  and  umlauts— are  not  counted);  the  difference  between  the  value  of 
Ho  for  English  and  those  for  other  European  languages  is  partially  explained 
by  this  difference  in  conventions:  (logj  27  =  4.76;  loga  26  =  4.70). 
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TABLE  2 


\^^^    Amount  of  informa- 

\.      tion  (conditional 
N,,^^  entropy) 

H, 

Hi 

H2 

Hs 

Language 

Form    N,^ 

spoken 

5.38 

4.76 

3.69 

0.70 

Russian 

written 

5.00 

4.35 

3.52 

3.01 

English 

written 

4.76 

4.03 

3.32 

3.10 

German 

written 

4.70 

4.10 

— 

— 

French 

written 

4.70 

3.98 

— 

— 

Spanish 

written 

4.70 

4.01 

— 

— 

information  in  spoken  Russian  in  further  detail  (these  calcula- 
tions are  described  in  a  paper  by  C.  Cherry,  M.  Halle,  and  R.  O. 
Jakobson  [18]  ).  The  amount  of  information  is  calculated  on 
the  basis  of  a  representation  of  speech  as  a  string  of  phonemes. 
Generally  speaking,  the  string  of  phonemes  does  not  reflect  all 
the  information  contained  in  the  utterance.  Aside  from  certain 
"sense-distinguishing"  information,  the  usual  province  of  lin- 
guistics, the  utterance  also  contains  an  enormous  amount  of  ad- 
ditional information:  It  transmits  emotional  overtones,  has  a 
logical  impact,  gives  information  about  the  speaker  (one  can 
recognize  the  speaker  by  his  voice),  etc.  The  amount  of  "sense- 
distinguishing"  information  in  a  portion  of  speech  correspond- 
ing to  the  phoneme  does  not  exceed  7  u  [i.e.,  7  bits— Tr.]  (the 
number  of  phonemes  in  a  language  is  in  no  instance  greater 
than  2'^:  Certain  Caucasian  languages  contain  the  most  pho- 
nemes, but  even  there  the  number  does  not  exceed  80). ^^  In 

--  The  function  of  distinction  is  fulfilled  not  only  by  phonemes  but  also  by 
such  "nonsegmental"  elements  as  stress  and  intonation;  the  authors  consider 
stress  to  be  one  of  the  differential  attributes  of  phonemes,  but  they  ignore  the 
sense-distinguishing  role  of  intonation. 
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modern  communication  lines,  speech  is  usually  transmitted 
within  a  frequency  band  of  7,000  cps;  this  means  that  for  an  av- 
erage transmission  rate  of  10  phonemes  per  second,  the  quan- 
tity of  information  in  a  piece  of  speech  corresponding  to  a  pho- 
neme amounts  to  5,600  u.'^^  Hence,  it  is  evident  what  a  small 
part  of  all  the  information  in  speech  sound  is  "sense-distinguish- 
ing" and  is  preserved  when  speech  is  represented  as  a  string 
of  phonemes  (for  several  measurements  of  the  value  of  "non- 
sense-distinguishing" information,  see  [10],  p.  109). 

The  authors  start  with  the  fact  that  there  are  42  phonemes 
in  Russian  (see  Table  3).  The  value  Hi  for  Russian  text  was 
calculated  from  data  on  phoneme  frequency.  Selections  of  con- 
versation were  used  for  the  frequency  count,  as  described  and 
transcribed  by  A.  M.  Peshkovsky^*  (the  text  contained  10,000 
phonemes  in  all).  If  the  frequency  is  known,  then  the  amount 
of  information  can  be  calculated  from  formula  (1)  as  follows: 

n 

Hi  =  -Yj  Pd)  log2  p{i) 
1=1 

=  -  (0.13  log2  0.13  + 0.10  log2  0.10+  •  •  •  +  0.0007  logs  0.0007) 

=  4.76  u. 

(The  terms  on  the  right  are  p(i)  \og2p(i)  for  all  phonemes; 
see  Table  3.) 

The  values  Hz  and  Ha  are  calculated  from  data  on  the  fre- 
quencies of  two-  and  three-phoneme  combinations.  Frequencies 
were  calculated  on  the  same  text;  only  phoneme  combinations 
existing  within  a  word  were  considered  (boundaries  within 
complex  words  and  compounds  were  treated  as  word  bounda- 
ries). On  the  basis  of  the  frequencies  of  two-phoneme  combi- 
nations, it  is  possible  to  calculate  the  amount  of  information  in 
a  combination  using  formula  (1): 

n        n 

H'^  =  -  YYj  pU,  i)  log2  p(j,  i)  =  8.45  u; 
hence,  from  formula  (6)  (p.  130),  H^^  H^  -  H^  =  3.69  u.  The 


^^  See  R.  M.  Fano,  "Information  Theory  Point  of  View  in  Speech  Communica- 
tion," Journal  of  the  Acoustical  Society  of  America,  Vol.  22,  1950,  p.  691. 

^  See  A.  M.  Peshkovsky,  Desyat'  tysyach  zvukov  [Ten  Thousand  Sounds']  (a 
collection  of  articles),  Gosizdat,  Leningrad- Moscow,  1925,  pp.  167-191. 
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value  Hs  =  0.7,  obtained  by  Cherry  et  al.,  is  clearly  too  low.  As 
a  matter  of  fact,  not  all  combinations  were  considered  during 
the  calculation  (see  [18],  p.  36).  In  addition,  the  text  length— 
10,000  phonemes— is  obviously  insufficient  for  accurate  estima- 
tion of  the  frequencies  of  three-phoneme  combinations. 

The  technique  for  obtaining  data  on  the  quantity  of  infor- 
mation in  a  letter  of  written  text  does  not  require  further  ex- 
planation. 

However,  the  values  H^,  H2,  and  H^  do  not  yield  a  very 
close  approximation  to  the  actual  information  per  text  symbol, 
since  they  account  only  for  the  statistical  dependencies  within 
two  to  three  symbols,  whereas  the  span  of  these  dependencies 
is,  in  fact,  much  broader.  One  cannot  obtain  more  accurate  data 
on  the  quantity  of  information  in  language  text  by  the  same 
means,  since  calculation  of  the  frequencies  of  combination  for 
strings  of  more  than  three  letters  or  phonemes  is  excessively 
unwieldy  and  cannot  be  performed  practically  by  hand. 

C.  E.  Shannon  [38]  proposed  an  indirect  method  of  calcu- 
lating the  amount  of  information  per  letter.  First,  one  must 
calculate  the  value  H^  for  a  word.  This  value  can  be  calculated 
from  word  frequencies.  Shannon  used  G.  Dewey's  frequency 
dictionary  [21].  (In  this  dictionary,  all  different  strings  of  let- 
ters between  two  spaces  are  considered  different  words,  since 
all  the  forms  of  a  word  are  considered  different  words,  while 
homographs  are  the  same  word.)  The  amount  of  information 
in  an  English  word  is  11.82  u.  The  average  length  of  a  word  in 
English  is  s  =  A.b  letters  (see  bibliography  for  Chapter  V, 
[49] );  a  space  lengthens  a  word  by  one  letter.  If  the  quantity 
of  information  per  combination  of  5.5  letters  of  text  averages 
out  as  11.82  u,  then  the  information  per  letter  is  11.82  m/5.5  = 
2.14  u.  This  calculation  permits  calculation  of  the  statistical 
bonds  over  spans  of  five  letters  of  text  and  thus  yields  a  closer 
approximation  to  the  actual  amount  of  information  per  letter 
of  text.  (In  general,  identification  of  the  amount  of  informa- 
tion per  word  with  the  amount  per  average  combination  of 
s  +  \  letters  is  not  entirely  proper;  since  statistical  bonds  among 
the  letters  within  a  word  are,  on  the  average,  stronger  than 
those  existing  among  five  arbitrary  letters,  the  amount  of  in- 
formation per  combination  of  5  +  1  letters  will  be  greater  than 
the  amount  per  word— see  [10].) 
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Analogous  calculations  were  performed  for  French,  German, 
and  Spanish  [15].  For  these  languages,  there  exist  no  dictiona- 
ries showing  individual  form  frequencies;  frequency  is  given 
for  each  word  as  a  whole— all  forms— for  which  reason  the 
identification  of  the  quantity  of  information  per  word  with  the 
average  quantity  for  5+1  letters  of  text  is  even  less  acceptable. 
The  results  of  calculating  the  amount  of  information  per  letter 
by  this  means  are  presented  in  Table  4.  (The  data  for  English 
are  presented  for  the  sake  of  comparison,  as  calculated  from  a 
similar  dictionary.) 


TABLE  4 


English 

French 

German 

Spanish 

1.65 

3.02 

1.08 

1.97 

6.2.  Experimental  Methods  for  Determining  the 
Amount  of  Information  and  Redundancy 

An  approximation  of  a  higher  order  to  the  quantity  of  in- 
formation per  letter  can  be  obtained  by  several  experimental 
methods  developed  by  Shannon  [38].  The  main  difficulty  in 
calculating  the  amount  of  information  in  language  text,  as  is 
evident  from  what  has  been  said,  lies  in  the  fact  that  one  must 
take  into  account  statistical  dependencies  that  are  active  over 
rather  long  spans.  It  is  possible,  however,  to  discover  the  sta- 
tistical structure  of  a  text  by  means  other  than  the  calculation 
of  combination  frequencies.  Every  man  who  speaks  a  particu- 
lar language  possesses  a  knowledge  of  the  statistical  structure  of 
text  in  that  language,  albeit  unconsciously.  This  knowledge 
manifests  itself  especially  in  the  fact  that  when  someone  is  asked 
to  guess  the  following— call  it  the  nth— letter  in  a  text  in  which 
the  n  —  I  preceding  letters  are  known  to  him,  his  guesses  will 
not  be  entirely  haphazard;  they  will  be  based  on  an  intuitive 
knowledge  of  the  probabilities  and  conditional  probabilities  of 
letters. 

After  performing  enough  letter-guessing  experiments,  one 
can  estimate  the  probability,  for  each  given  number  of  preced- 
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ing  letters,  that  a  following  (nth)  letter  will  be  guessed  correctly 
at  the  first,  second,  etc.,  attempts  (since  there  are  27  letters  in 
the  English  alphabet— i.e.,  26  letters  plus  the  space— the  maxi- 
mum number  of  attempts  is  27).  Shannon  has  shown  that  there 
are  sufficient  data  in  guess  probabilities  to  calculate  the  condi- 
tional entropy  of  the  (n  —  l)th  order,  i.e.,  the  value  Hn  (or 
rather,  one  can  find  the  interval  within  which  the  entropy  is  to 
be  found,  i.e.,  the  upper  and  lower  bounds-^). 

In  Figure  8,  we  present  a  graph  constructed  from  Shannon's 
data  to  show  the  decrease  in  the  quantity  of  information  per 
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Figure  8.     Information,  H,  per  Letter  as  a  Function 
of  the  Number,  n,  of  Letters. 


letter  as  the  number  of  letters  increases,  when  the  dependencies 
among  letters  are  taken  into  account  in  calculating  amount  of 
information.  (The  value  //—the  amount  of  information— is  rep- 


^The  upper  and  lower  bounds  of  the  quantity  of  information  are  defined  by 
the  following  inequality: 

X  i(qi  —  e^+i)  log2  (i)  <  Hn  <  —  Jl  ql  log2  g" 

where  ^ "  is  the  probability  of  guessing  a  letter  at  the  zth  attempt  when  the 
(n  —  l)th  letter  is  known.  For  further  details,  see  [11]. 
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resented  as  a  function  of  n,  where  n  is  the  number  of  letters 
known  to  the  subject  as  he  makes  his  guesses.) 

The  redundancy  of  a  given  language  can  be  calculated  from 
data  on  quantity  of  information,  with  the  same  approximation 
(using  formula  (9),  p.  131).  Calculated  from  H3,  the  redundancy 
of  Russian  is  39.8  per  cent;  that  of  English  is  30.7  per  cent.  The 
approximate  value  of  the  redundancy  can  also  be  found  by 
means  of  the  following  very  simple  experiment.  Someone  is 
made  to  guess  letter  after  letter  in  a  text  of  a  definite  length 
(e.g.,  a  selection  100  letters  long  every  time);  if  the  next  letter 
is  not  guessed  correctly,  the  subject  is  told  the  right  letter,  and 
the  experiment  continues.  We  cite  as  an  example  the  results  of 
one  experiment  in  English.  On  the  upper  line,  the  entire  selec- 
tion is  given;  on  the  lower  line,  only  those  letters  that  had  not 
been  guessed;  the  space  between  words  is  counted  as  one  letter. 
The  room  was  not  very  light  a  small  oblong  reading  lamp  on 

roQ not  HI        H sm o b I rea q_ 

the  desk  shed  glow  on  polished  wood  but  less  on  the  shabby 

d glo 0.^—Ls o b  ii_  L_5_o sh 

red  carpet, 
re c 


There  are  129  letters  in  all  in  the  selection;  89  letters— i.e., 
69  per  cent— are  guessed  correctly;  hence  redundancy  is  evalu- 
ated at  69  per  cent.  The  figure  obtained  as  a  result  of  the  ex- 
periment, when  the  latter  is  conducted  on  a  large  enough  num- 
ber of  selections,  can  be  accepted  as  an  approximate  value  for 
the  redundancy  of  a  given  language. 

This  method  is  particularly  interesting  because  the  experi- 
ment here  can  be  conducted  not  only  on  letters,  but  on  words 
as  well.  (In  the  previous  guessing  experiment,  letters  could  not 
be  exchanged  for  words:  The  maximum  number  of  attempts 
for  letters  is  limited  by  the  length  of  the  alphabet,  whereas  for 
words,  the  number  of  attempts  can  prove  too  great.) 

Analogous  experiments  have  been  made  in  guessing  a  mes- 
sage represented  as  a  string  of  phonemes.  D.  G.  Fry  [25]  ob- 
tained a  redundancy  for  English  text  of  55  per  cent.  However, 
these  experiments  apparently  yield  a  lower  per  cent  of  redun- 
dancy just  because  "guessing  games"  are  more  unusual  for  pho- 
nemes. The  end  of  the  road  for  experimental  research  on  the 
redundancy  of  spoken  language  is  not  as  yet  in  sight. 
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6.3.  The  Upper  Limit  of  Redundancy 

From  his  experiments.  Shannon  obtained  the  vahie  of  infor- 
mation only  for  n  =  15  and  n  =  100,  whereas  the  increase  in 
redundancy  over  this  interval  is  rather  significant  (see  Figure 
8).  It  is  important  to  determine  how  far  redundancy  will  in- 
crease as  the  amount  of  text  selected  increases.  Is  there  a  limit 
beyond  which  an  increase  in  the  amount  of  text  will  not  be  ac- 
companied by  increase  in  redundancy  (i.e.,  some  maximum  dis- 
tance over  which  the  statistical  bonds  among  text  symbols  have 
effect)?  These  questions  were  posed  in  a  paper  by  N.  G.  Bur- 
ton and  J.  C.  R.  Licklider  [16]. 

Figure  9  shows  the  results  of  a  determination  of  the  redun- 
dancy (R)  for  n  =  2,  4,  6,  8,  16,  32,  64,  128,  and  10,000.  Re- 
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64 

Value  of  « 
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Figure  9.     Redundancy,  R,  per  Letter  as  a  Function 
of  the  Number,  n,  of  Letters. 


dundancy  was  estimated  by  Shannon's  method  (see  pp.  152-154). 
The  two  curves  correspond  to  the  upper  and  lower  limits  of 
redundancy.  As  n  increases  from  0  to  32,  a  sharp  increase  in  the 
redundancy  is  observed,  but  with  further  increase  in  the  con- 
text from  32  to  10,000,  no  essential  increase  in  redundancy  takes 
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place.  Thus,  the  increase  in  redundancy  with  increase  in  con- 
text is  not  infinite.  The  extreme  redundancy  for  English  lies  in 
the  interval  between  55  and  75  per  cent,  while  the  span  of  ac- 
tivity of  the  statistical  bonds  among  the  letters  in  text  is  about 
30  letters.2« 


6.4.  A  Statistical  Model  of  Language  on  the  Level  of 
Meaning  Units 

Up  to  now,  we  have  dealt  with  language  only  on  the  levels  of 
phonemes  or  letters.  Attempts  to  construct  statistical  models  of 
language  on  the  level  of  meaning  units  are  most  interesting 
[24].  As  we  have  already  said,  construction  of  statistical  models 
of  language  at  this  level  is  complicated  by  the  difficulty  of  ob- 
taining statistical  data,  or  even  an  alphabet  of  units,  on  this 
level.  These  difficulties  can  be  overcome,  however,  if  a  highly 
formalized  language  is  studied;  in  this  case,  the  "language"  of 
pilots  was  analyzed,  or  rather  the  language  used  in  a  strictly 
confined  situation:  The  notes  used  for  the  research  were  record- 
ings of  radio  conversations  between  pilot  and  control  tower  dur- 
ing landing.  (Notes  were  made  during  100  landings.) 

Not  the  word,  the  morpheme,  or  any  of  the  elements  usually 
dealt  with  by  linguists  were  considered  to  be  the  elementary 
(nonexpanded)  symbols.  The  authors  felt  that  a  message  was 
rather  a  sequence  of  "semantic  messages,"  or  "elements  of  in- 
formation," i.e.,  more  or  less  complete  elementary  utterances, 
A  list  of  these  elements  was  composed  in  this  way.  The  landing 
process  was  divided  into  several  parts  or  "situations";  for  each 
such  situation,  a  complete  list  of  possible  utterances  was  made, 
and  then  the  utterances  were  split  into  semantic  elements.  In 
language  expression,  "semantic  elements"  are  usually  short 
statements  having  a  highly  specific  structure  from  the  stand- 
point of  ordinary  language;  sometimes  these  are  individual 
words.  Thus,  the  utterance:  "Langley  Towers  11  aircraft  611, 
II  two  miles  out,  ||  landing  instructions,  please"  is  split  into 
four  such  elements  (their  bounds  are  defined  by  vertical  lines). 


^  For  n  =z  32  and  n  ■=.  64,  an  iterative  experiment  was  performed,  as  called  for 
by  the  fact  that  the  redundancy  for  n  :zz  64  was  lower  than  the  redundancy  for 
n  =  32. 
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The  authors  do  not  formulate  criteria  by  which  to  be  guided 
when  splitting  the  elementary  semantic  units;  they  consider  it 
sufficient  to  indicate  that  such  criteria  depend  on  the  intuition 
of  people  familiar  with  this  "language,"  and  that  this  intuition, 
in  most  cases,  led  different  people  to  identical  results.  We  em- 
phasize the  fact  that  a  list  of  "content  elements"  is  being  dis- 
cussed, not  the  division  of  some  language  units  or  other;  for 
this  reason,  such  synonymous  expressions  as  gear  doiun/'^  gear 
green,  three  green,  gear  down  and  locked  with  three  green 
lights  (meaning  "go  more  slowly")  are  considered  to  be  the 
same  unit.^^ 

After  a  list  of  the  semantic  elements  in  all  messages  had  been 
made,  the  redundancy  caused  by  differences  in  frequency  and 
by  restrictions  on  combinability  of  elements  was  determined. 
The  total  redundancy  is  80  per  cent  (the  message  is  understood 
to  be  everything  said  by  the  pilot  in  the  course  of  the  entire 
landing);  the  redundancy  caused  by  unequal  probabilities  is 
about  30  per  cent,  while  that  caused  by  dependencies  among 
the  elements  is  50  per  cent.  In  calculating  redundancy,  the  au- 
thors took,  for  Hmax,  the  information  contained  in  one  semantic 
element  under  the  conditions  of  equal  probability  and  inde- 
pendence of  elements  (Hmax  =  9  m);  from  the  total  number  of 
messages  that  could  have  been  transmitted  at  the  observed  rate, 
however,  one  reaches  an  even  higher  figure  for  redundancy.  To 
determine  over-all  redundancy,  one  must  change  over  to  a  count 
of  the  quantity  of  information  in  a  section  of  phonemes  or  let- 
ters corresponding  to  one  semantic  element.  From  the  authors' 
calculations,  the  redundancy  is  i?  —  0.8,  whence  we  compute 
that  the  relative  information  //rei  =  1  —  -R  =  0.2  =  20  per  cent; 
actually,  the  value  of  the  relative  information  has  to  be  de- 
creased by  a  factor  of  4.  If  we  assume  that  a  10-letter  sequence 
corresponds,  on  the  average,  to  a  semantic  element,  then  9  u 
[9  bits— Tr.]  per  semantic  element  is  equivalent  to  1.1  u  per 
letter.  The  maximum  amount  of  information  per  letter  attain- 
able with  26  letters  is  4.70  u,  i.e.,  about  four  times  more.  Con- 


"  [Here  the  original  has  glar  down,  apparently  a  misprint.— Tr.] 
^  Numerals  presented  a  special  problem  in  drawing  up  the  list.  In  order  not 
to  complicate  the  number  of  semantic  elements,  numerical  information  is  han- 
dled separately. 
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sequently,  the  final  value  for  the  relative  information  is  about 
5  per  cent,  whereas  the  total  redundancy  amounts  to  approxi- 
mately 95  per  cent  (which  corresponds  with  the  data  of  other 
research  on  the  same  "language"  [23],  using  a  statistical  analy- 
sis of  utterances  written  as  strings  of  letters). 

The  sharp  increase  in  redundancy,  as  compared  with  the 
level  generally  characteristic  for  language,  is,  of  course,  related 
to  the  fact  that  a  highly  specialized  language  was  subjected  to 
analysis.  Data  on  redundancy  in  specialized  languages  are  very 
interesting  to  linguists,  especially  from  the  following  stand- 
point: The  main  cause  of  redundancy  in  a  language  is,  as  we 
have  explained,  the  fact  that  a  message  is  produced  in  a  nar- 
rowly restricted  situation.  A  comparison  of  the  redundancy  in 
different  specialized  languages  might  well  aid  in  an  estimation, 
however  approximate,  of  what  fraction  of  text  redundancy  must 
be  ascribed  to  the  influence  of  the  situation,  i.e.,  to  restrictions 
outside  of  the  code. 


7.  Linguistic  Interpretations  of  Information  and 
Redundancy 

Data  on  amount  of  information  and  redundancy  are  undoubt- 
edly most  valuable  to  communications  technology;  in  human  so- 
ciety, information  is  transmitted  constantly  and  in  great  quan- 
tity. Most  of  this  information  is  expressed  via  some  one  of  the 
natural  languages.  Language  code  possesses  a  very  high  degree  of 
redundancy;  for  this  reason,  the  degree  of  economy  that  would 
be  accomplished  from  the  development  of  effective  codes  for 
natural  languages  is  enormous.  However,  we  shall  not  pursue 
the  technical  application  of  data  on  the  quantity  of  information 
and  redundancy  in  language  (e.g.,  see  [3]  and  [11]),  but  rather 
we  shall  take  a  look  at  several  of  the  possibilities  that  infor- 
mation theory  opens  up  for  linguistics.  These  possibilities 
are  not  entirely  obvious,  and  it  is  difficult  to  fit  them  into  a  co- 
herent system.  Therefore,  we  shall  confine  ourselves  to  indicat- 
ing various  methods  and  problems.  In  the  majority  of  cases, 
final  results  are  still  not  available,  so  that  one  can  only  speak  of 
a  more  or  less  valid  approach  to  the  problem. 
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7.1.  A  Comparison  of  Languages  from  the 
Standpoint  of  Redundancy 

The  levels  of  redundancy  of  various  languages  apparently  lie 
within  certain  narrow  limits,  permitting  one  to  speak  of  a  level 
of  redundancy  of  60  to  70  per  cent  as  a  universal  property  of 
language  in  general  (of  course,  the  present  data  are  restricted 
to  European  languages).  This  fact  is  interesting  for  several  rea- 
sons; in  particular,  one  can  place  it  in  immediate  relationship 
with  another  quantitative  regularity:  The  data  on  a  rather  large 
number  of  languages  suggest  that  in  any  language  the  average 
length  of  the  minimum  meaningful  unit  (the  morpheme)  is  in- 
versely proportional  to  the  number  of  phonemes  in  the  phono- 
logical system. 2^  We  can  easily  illustrate  this  relationship  with 
the  following  "extreme"  cases.  In  Hawaiian,  whose  phonologi- 
cal system  contains  only  13  phonemes  in  all,  most  morphemes 
consist  of  two  syllables  (i.e.,  4  phonemes,  on  the  average);  more- 
over, very  many  morphemes  consist  of  three  or  more  syllables, 
while  there  are  almost  no  one-syllable  morphemes.  On  the  other 
hand,  in  some  Caucasian  languages  containing  around  70  pho- 
nemes, almost  every  consecutive  phoneme  in  a  word  forms  an 
independent  morpheme.  For  all  the  cases  in  between,  about  the 
same  ratio  obtains.  The  presence  of  such  a  relationship  is  nat- 
ural, if  one  approaches  language  as  a  "rationally  constructed 
code";  the  morphemes  in  a  language  must  be  "encoded"  with 
strings  of  phonemes.  All  morphemes  could  be  distinguished 
from  one  another  by  means  of  a  small  number  of  phonemes- 
two  would  do;  but,  in  this  case,  the  length  of  the  code  combina- 
tion for  each  morpheme  would  be  very  great.  The  more  pho- 
nemes in  a  phonological  system,  the  shorter  the  morphemes 
must  be,  or  else  coding  would  be  too  redundant.  Thus,  if  a  tend- 
ency really  exists  for  redundancy  to  be  constant  in  language, 
the  relation  indicated  is  a  quite  natural  consequence  of  this 
tendency.  Unfortunately,  our  conclusions  regarding  redun- 
dancy in  various  languages  are  still  quite  incomplete.  In  order 
to  unify  the  different  quantitative  regularities  of  languages  into 

^'See  C.   F.   Hockett,  A    Course  in  Modern  Linguistics,  The  Macmillan  Co., 
New  York,  1958,  p.  93. 
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one  system,  it  is  very  important  that  we  investigate  whether 
there  exists  any  connection  among  redundancy,  number  of  pho- 
nemes, and  average  word  length  in  languages. 

Of  course,  it  is  evident  that  one  can  speak  of  the  relative  near- 
ness of  the  levels  of  redundancy  in  various  languages  only  as  a 
general  tendency  from  which  divergencies  cannot  fail  to  occur, 
while  the  differences  among  languages  from  the  standpoint  of 
redundancy  are  most  interesting  in  themselves.  A  language's  re- 
dundancy is  the  result  of  restrictions  on  the  combinability  of 
units  and  is,  therefore,  immediately  connected  with  the  presence 
in  the  language  of  a  definite  structure  (in  the  sense  discussed 
on  p.  137).  Thus,  redundancy  can  be  a  quantitative  meas- 
ure of  the  "structuredness"  of  language  and  can  form  the  basis 
for  a  typological  comparison  of  different  languages.  It  would 
be  very  interesting,  for  example,  to  compare  the  redundancies 
in  languages,  assigning  different  roles  to  morphology  within  the 
grammatic  structure,  especially  in  Russian  and  English.  The 
present  data  (such  as  the  fact  that  H3  is  9  per  cent  greater  for 
written  Russian  than  for  written  English)  force  us  to  propose 
that  the  redundancy  curve  for  Russian  will  increase  significantly 
more  rapidly  than  that  for  English. 

A  comparison  of  various  languages  in  terms  of  their  redun- 
dancies would  give  a  more  precise  meaning  to  evaluations  of 
the  relative  effectiveness  of  those  languages,  or  of  one  language 
at  different  stages  in  its  development,  and  would  make  the  con- 
tent of  the  concept  of  "progress  in  language"  more  exact.  The 
idea  of  "progress  in  language"  occupies  a  large  place  in  O.  Jes- 
persen's  works. ^°  But,  in  practice,  Jespersen  identified  progress 
in  language  with  the  tendency  to  "analytism,"  by  which  he 
meant  a  decrease  in  the  use  of  morphology  for  the  expression 
of  syntactic  bonds  on  account  of  word  order.  It  would  be  inter- 
esting to  explain  how  qualitative  changes  are  connected,  accord- 
ing to  Jespersen,  with  change  in  the  value  of  redundancy  in  a 
language,  and  whether  there  exist  certain  regularities  for  the 
change  in  redundancy  during  the  development  of  different  lan- 
guages. An  explication  of  such  quantitative  regularities  would 


'^  See   O.  Jespersen,  Efficiency   in  Linguistic  Change,  E.  Munksgaard,  Copen- 
hagen, 1941. 
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allow  one  to  predict  the  direction  of  further  development  of  a 
language,  beginning  with  a  quantitative  analysis  of  its  contem- 
porary condition. 

One  could  also  ask  about  the  coding  effectiveness  of  various 
languages  for  the  "exact  same  content"  (see  bibliography  for 
Chapters  I  and  II,  [25]).  If  we  believe  that  the  content  of  a  text 
remains  the  same  when  translated  into  another  language,  then 
the  relationship  between  the  quantities  of  information  in  texts, 
where  one  is  the  translation  of  the  other,  must  indicate  the  "se- 
mantic weight"  of  one  unit  of  information  in  the  different  lan- 
guages. (The  amount  of  information  in  a  text,  which  is  in  no 
way  connected  with  the  content  of  that  text,  is  determined  from 
the  number  of  symbols  in  the  alphabet  and  by  the  statistical 
regularities  of  combinations  in  the  corresponding  languages, 
and  naturally  does  not  remain  constant  in  translation.) 

However,  the  following  interesting  fact  was  found  upon  com- 
parison of  the  amounts  of  information  in  two  texts  that  were 
identical  in  content— one  in  German  and  the  other  in  English: 
If  the  original  is  in  English,  then  for  each  unit  of  information 
in  it,  there  are  1.22  u  in  the  German; ^^  but  if  the  original  is  in 
German,  and  the  English  is  a  translation,  then  for  each  unit  of 
information  in  the  English  text,  there  are  only  1.07  u  in  Ger- 
man. Thus,  the  relation  of  the  two  texts,  with  respect  to  quan- 
tity of  information,  depends  on  the  direction  of  the  translation. 
We  can  assume  that  every  process  of  translation  leads  to  a  rela- 
tive leng^thenino;  of  the  text,  since  even  the  text  content  does 
not  remain  the  same  in  translation.  Because  it  is  impossible  to 
convey  the  text  content  exactly  through  the  medium  of  another 
language,  every  translation  becomes  in  part  an  explanation.  Ac- 
cordingly, a  decided  increase  in  information  occurs.  If  we  as- 
sume that  in  translating  from  English  to  German  and  back 
again,  the  increase  in  information  remains  the  same,  then  we 


"The  quantity  of  information  in  a  text  was  determined  with  the  aid  of  for- 
mula (3):  H™  ^  mHi,  where  m  is  the  number  of  letters  in  a  text  (i.e.,  assum- 
ing that  the  letters  are  statistically  independent).  Hj  are  quite  insignificantly 
different  for  English  and  German  (from  the  authors'  data,  H^  for  English  is 
4.08,  and  for  German,  it  is  4.07).  More  accurate  approximations  to  the  amount 
of  information  per  letter  will  apparently  be  more  indicative,  because  for  Ger- 
man and  English,  they  differ  to  a  much  higher  degree. 


162     Information  Theory  and  the  Study  of  Language 

can  determine  both  the  value  of  this  increase  and  the  semantic 
weight  of  a  unit  of  information  in  these  languages  as  independ- 
ent of  the  direction  of  translation:  Let  x  u  of  German  stand  for 
1  M  of  information  in  English  (then  per  1  m  of  German  we  have 
I/x  in  English);  furthermore,  let  the  increase  in  information 
during  translation  be,  for  each  unit  of  information  of  the  ini- 
tial text,  y  u.  Then,  one  can  construct  the  following  system  of 
equations: 


(7") 


x  +  y  =  1.22,         1.07  (—  +  ?/)  =  1.00, 

whence  x  '-^  \.\b;  ^  ^  0.06. 

Although  the  data  obtained  are  not  final,  research  in  the  di- 
rection indicated  is  undoubtedly  most  interesting.  It  would  be 
very  important  to  determine  the  values  of  both  variables,  as 
well  as  the  semantic  weight  of  a  unit  of  information  and  the 
proportional  increase  in  information  in  translation.  Obviously, 
it  would  also  be  desirable  to  create  conditions  for  determining 
these  values  individually.  Thus,  one  can  believe  that  scientific 
texts  containing  strictly  defined  terminology,  such  as  mathe- 
matical texts,  allow  a  maximum  adequacy  of  translation,  i.e., 
translation  causes  a  minimal  addition  of  information.  In  trans- 
lating such  text,  the  main  change  in  quantity  of  information 
will  be  caused  by  the  statistical  structures  of  the  codes  involved. 
If  we  know  how  the  amount  of  information  changes  in  relation 
to  the  differences  in  the  codes,  we  can  then  compare  the  growth 
of  information  for  different  texts  or  translators. 


7.2.  Amount  of  Information  and  the  Functional  Load 
OF  Phonological  Contrasts 

The  concepts  of  amount  of  information  and  redundancy  as 
applied  to  language  must  not  be  thought  of  as  completely  new. 
Their  value  lies  primarily  in  their  close  relation  to  other  con- 
cepts, whose  necessity  was  recognized  by  linguists  long  before 
attempts  to  use  the  methods  of  information  theory  in  these  areas 
appeared.  There  is  an  incontestable  connection  between  the 
concepts  of  quantity  of  information  and  redundancy  and  the 
concept  of  functional  load  of  phonemes  or  phonemic  contrasts, 
the  concept  of  strong  and  weak  positions,  and  other  phonologi- 
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cal  concepts.  Bringing  linguistic  concepts  closer  to  those  of  in- 
formation theory  can  aid  in  making  the  former  more  precise. 

The  concept  of  the  functional  load  of  a  phoneme  and  of 
phonological  contrast  is  undoubtedly  quite  productive  (see,  for 
example,  bibliography  for  Chapters  I  and  II  [7],  p.  78).  It  is 
especially  used  in  diachronic  phonology,  in  which  one  of  the 
main  theses  states  that  the  probability  of  loss  of  a  phonological 
contrast  is  inversely  related  to  its  functional  load,  since  the  loss 
of  a  contrast  that  plays  a  fundamental  role  in  language  has  a 
negative  influence  on  mutual  comprehension.  The  concept  of 
functional  load  is  not  exact  enough.  Functional  load  is  usually 
connected  with  the  number  of  minimal  pairs  containing  a  cer- 
tain contrast  (i.e.,  of  pairs  like  dom—tom,  hyl—pyl,  etc.),  al- 
though such  a  method  of  evaluation  is  usually  unsuccessful  (see 
bibliography  for  Chapters  I  and  II  [7],  pp.  79-83). 

The  possibilities  of  differentiation  presented  by  a  particular 
phonological  system  as  a  whole  can  be  evaluated,  for  example, 
from  the  number  of  morphemes  of  a  certain  length  that  can  be 
distinguished  by  the  system.  These  possibilities  are  defined  by 
the  total  number  of  phonemes,  their  frequencies,  and  the  de- 
gree to  which  their  mutual  combinatory  capabilities  are  re- 
stricted. Obviously,  therefore,  one  can  evaluate  the  differentiat- 
ing possibilities  of  a  phonological  system  from  the  amount  of 
information  per  average  phoneme  in  the  system. 

When  phonological  contrast  is  lost,  two  phonemes,  or  several 
pairs  of  phonemes,  are  merged  into  one,  and  the  differentiating 
possibilities  of  the  system  decrease.  For  this  reason,  the  differ- 
ence H  —  H*  (where  H  is  the  amount  of  information  per  pho- 
neme in  the  initial  system,  and  H*  is  the  amount  of  informa- 
tion in  the  system  with  one  contrast  lost)  can  be  thought  of  as 
the  differentiating  load  that  falls  on  the  particular  contrast.  It 
is  clearly  more  appropriate  to  use  the  value  (H  —  H*)  jH, 
which  expresses  the  fraction  of  the  functional  load  that  falls  on 
the  contrast. 

Such  a  method  for  evaluating  functional  load  was  proposed 
by  Hockett,^^  who  also  asserted  that  the  functional  load  so  calcu- 
lated for  each  individual  contrast  is  so  small  that  the  loss  of  any 


^^  See  C.  F.  Hockett,  A  Manual  of  Phonology,  Waverly  Press,  Baltimore,  Mary- 
land, 1955. 
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contrast  can  hardly  reduce  mutual  comprehension.  This  asser- 
tion needs  experimental  verification. 


7.3.  A  Quantitative  Evaluation  of  Positions  in 
Phonology  and  Syntax 

In  studying  the  degree  of  use  of  phonological  contrast  in  lan- 
guage, such  concepts  as  that  of  the  "position  of  maximum  dif- 
ferentiation," "strong  and  weak  positions,"  etc.,  are  very  im- 
portant. Introduction  of  such  concepts  is  related  to  the  neces- 
sity of  separating  out  rules  that  prevent  the  realization  of  cer- 
tain phonemic  distinctions  (in  Russian,  for  example,  the  posi- 
tion in  an  unstressed  syllable  is  weak  for  vowels,  because  the 
number  of  contrasted  vowels  there  decreases).  However,  the 
differentiation  of  only  two  forms  of  position— weak  and  strong- 
is  clearly  insufficient,  since  in  many  cases  the  actual  picture  is 
much  more  complex.  Thus,  in  French  there  exists  no  position 
in  which  all  ten  non-nasalized  vowels  are  distinguished.  In  a 
nonfinal  syllable  of  a  word,  the  contrast  between  open  [e]  and 
closed  [e]  is  neutralized:  [e]  and  [e]  become  positional  vari- 
ants in  the  pronunciation  of  most  Frenchmen  (as  with  restons 
[resto]  and  laissons  [leso]);  in  the  final  and  open  syllable,  con- 
trast between  [0]  and  [oe]  is  neutralized,  and  between  [o]  and 
[o],  since  [oe]  and  [o]  are  impossible  in  the  final  position. 
Many  neutralizations  likewise  occur  before  the  so-called 
"lengthening"  consonants  [r],  [z],  [v],  [z]^^  in  the  final  closed 
syllable. 

It  hardly  seems  useful  to  call  any  of  these  positions  "strong," 
as  opposed  to  "weak."  It  is  not  immediately  clear  which  of  these 
positions  is  the  position  of  maximum  differentiation.  Neverthe- 
less, quantitative  comparison  of  positions  among  themselves  can 
turn  out  to  be  quite  essential  in  describing  the  phonological 
system.  The  positions  can  be  compared  by  their  degree  of 
"weakness,"  which  can  be  identified  with  redundancy  per  pho- 
neme in  certain  conditions. 

In  linguistic  description,  the  characteristics  of  position  can 
vary:  position  before  or  after  certain  phonemes,  position  with 


[The  [z]  is  repeated  in  the  original  text.— Tr.] 
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respect  to  stress,  to  end  of  word,  etc.  If  we  select  one  feature 
characterizing  position— the  position  after  a  particular  phoneme 
—then  we  can  identify  the  differentiating  possibilities  of  the 
phoneme  in  this  position  with  its  conditional  entropy.  For  Rus- 
sian, the  conditional  entropy  of  the  first  order  is  3.82  u;  since 
the  maximum  quantity  of  information  with  42  phonemes  is 
5.38  II,  the  redundancy  of  29  per  cent  can  be  considered  a  quan- 
titative measure  of  the  "weakness"  of  this  position.  There  are 
no  strong  positions  in  this  language,  in  that  sense. 

An  evaluation  of  the  differentiating  possibilities  of  phonemes 
in  structurally  characterized  positions  would  be  interesting,  e.g., 
a  comparison  of  the  amount  of  information  in  a  phoneme  lo- 
cated at  the  beginning  or  end  of  a  word,  after  various  classes  of 
phonemes,  etc. 

In  the  area  of  syntax,  several  calculations  have  been  made 
that  now  permit  a  quantitative  comparison  of  the  various  syn- 
tactic positions  in  a  sentence  [12].  Research  has  been  performed 
on  English  materials  in  which  sentences  written  as  strings  of 
syntactic  word  classes  were  examined.^*  In  a  rather  long  text,  all 
sentences  eleven  to  twenty-five  words  in  length  were  selected. 
The  frequencies  of  various  word  classes  in  various  positions  in 
these  sentences  were  calculated;  the  position  was  identified  by 
the  ordinal  number  of  the  word  in  the  sentence.  It  was  deter- 
mined that  in  all  positions,  the  frequencies  of  various  word 
classes  practically  correspond  with  their  over-all  frequencies  in 
the  text.  The  first  and  last  positions,  for  which  there  is  a  very 
large  difference  from  the  total  frequency,  are  exceptional.  Such 
a  result  is  entirely  natural:  Only  in  those  positions  are  struc- 
tural restrictions  imposed  on  the  syntactic  classes;  thus,  a  sen- 
tence cannot  end  with  an  auxiliary,  and  hardly  ever  begins  with 
a  verb— the  imperative  is  rarely  found  in  written  text— etc.  The 
redundancy  in  each  position  was  calculated  from  the  frequen- 
cies obtained.  Redundancy  in  English  sentences  is,  on  the  av- 
erage, about  three  times  greater  in  the  first  and  last  positions 
than  in  the  other  positions,  and  it  is  somewhat  greater  in  the 
final  position  than  in  the  initial. 

'*  Fries'  simplified  classification  was  used  (see  C.  Fries,  The  Structure  of  Eng- 
lish, Harcourt,  Brace  &  Co.,  Inc.,  New  York,  1952).  Words  were  divided  into  five 
classes:  noun,  adjective,  verb,  adverb,  and  auxiliary. 


166    Information  Theory  and  the  Study  of  Language 

7.4.  Entropy  and  the  Determination  of  Boundaries 
BETWEEN  Linguistic  Units  in  Text 

Attempts  to  apply  the  concept  of  entropy  to  the  definition  of 
boundaries  between  text  elements  deserve  attention,  especially 
those  attempts  to  define  the  boundaries  between  elements  of  a 
higher  level  in  a  text  that  is  described  as  a  sequence  of  units  of 
a  lower  level  (e.g.,  the  boundaries  between  morphemes  or 
words  in  a  sequence  of  phonemes).  The  first  steps  in  this  direc- 
tion were  taken  in  properly  linguistic  terms  and  were  not  im- 
mediately connected  with  a  statistical  approach  to  language.^^ 
Harris,  for  example,  described  the  following  procedure  for  text 
analysis.  An  utterance  is  taken,  for  example:  "He  is  clever," 
transcribed  phonologically  as  [hiyzklever].  All  possible  sen- 
tences beginning  with  the  phoneme  [h]  are  selected.  The  sen- 
tences can  be  drawn  from  a  sufficiently  large  text  or  simply  in- 
vented. A  count  is  made  of  the  number  of  phonemes  that  can 
follow  this  particular  phoneme;  they  are  called  "descendants" 
of  that  phoneme.  The  number  of  descendants  of  the  phoneme 
[h],  according  to  the  calculations  of  the  author,  is  nine.  Then, 
sentences  beginning  with  the  same  two  phonemes  as  the  exam- 
ple sentence,  i.e.,  with  the  combination  [hi],  are  selected  in  an 
analogous  fashion.  At  various  points  in  the  utterance,  the  num- 
ber of  descendants  changes  periodically— it  falls,  then  rises, 
forms  a  peak,  falls  again,  etc.  Only  strictly  limited  phoneme 
combinations  are  possible  within  a  morpheme,  although  the  fol- 
lowing morpheme  can  begin  with  almost  any  phoneme.  Thus, 
one  can  assume  that  a  phoneme  whose  descendants  form  a  peak 
is  the  last  phoneme  in  a  morpheme  and,  therefore,  that  the  pro- 
cedure described  yields  the  splitting  of  text  into  morphemes. 
For  the  example  given,  the  distribution  of  descendants  is  as 
follows: 
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— 

— 

— 

"' See  Z.  S.  Harris,  "From  Phoneme  to  Morpheme,"  Language,  Vol.  31,  No.  2, 
1955,  pp.  190-222;  see  also  S.  Chatman,  "Immediate  Constituents  and  Expansion 
Analysis,"  Word,  Vol.  11,  No.  3.  1955,  pp.  377-391. 
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The  number  of  descendants  attains  a  maximum  after  [hiy] 
and  [hiyz],  which  corresponds  with  the  real  boundary  between 
the  morphemes  in  this  sentence. 

It  is  evident  that  calculation  of  the  number  of  descendants 
amounts  to  an  attempt  to  find  a  numerical  expression  for  the 
indefiniteness  of  the  following  phoneme  at  a  given  point  in  the 
utterance,  and  information  theory  proposes  to  call  this  indefi- 
niteness entropy.  The  possibilities  of  relating  the  boundaries  of 
words  or  morphemes  to  peaks  in  entropy  have  been  considered 
in  a  sequence  of  phonemes. ^^  Suppose  we  have  the  sentence: 
"Let  me  in,"  transcribed  as:  [let  mi  in].  The  value  Ha^  is  de- 
termined for  each  phoneme  (see  formula  (4),  p.  129).  At  first, 
(21  is  the  phoneme  [1],  then  the  combination  [le],  etc.  (The 
author  does  not  state  positively  that  the  calculated  value  is 
really  not  the  entropy  or  the  conditional  entropy  used  by  infor- 
mation theory— in  particular,  the  quantity  Ha^— hut,  rather,  that 
it  is  the  amount  of  information  under  a  fixed  condition.  The 
calculation  of  entropy  or  conditional  entropy,  i.e.,  the  quantity 
H^,  would  not  lead  to  any  useful  results  for  our  problem:  H^  is 
the  same  at  all  points  in  the  utterance,  and  the  conditional  en- 
tropy decreases  regularly  as  n  increases;  see  Figure  8.) 

For  our  example,  the  values  of  Ha-^  are  as  follows: 


1 

s 

t 
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i 

i 

n 

.81 

.36 

.55 

.15 

.84 

.22 

— 

(Each  phoneme  is  given  a  subscript  corresponding  to  Ha^,  i.e., 
the  indefiniteness  of  the  following  phoneme.)  The  growth  of 
Ha^  after  [let]  and  [letmi]  corresponds  to  the  boundaries  be- 
tween morphemes  in  this  sentence.  This  method  was  verified 
for  100  sentences  and,  from  the  author's  calculations,  gave  posi- 
tive results  in  about  85  per  cent  of  the  cases. 

However,  one  cannot  consider  this  direction  of  research  pro- 
ductive. One  of  the  reasons  for  lack  of  success  in  attempts  to 


^^  See  C.  S.  Chomsky,  "The  Determination  of  Word  Boundaries  in  Phonemic 
Sequences,"  Collection  of  Papers  Presented  at  the  Seminar  in  Mathematical 
Linguistics,  Harvard  University,  Cambridge,  Massachusetts,  1955  (mimeographed). 
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define  boundaries  between  elements  as  points  of  increase  in 
entropy  is  connected  with  the  complexity  of  the  language  code, 
i.e.,  with  the  presence  of  several  levels  in  it.  Certain  restrictions 
are  imposed  on  text  written  as  a  sequence  of  phonemes;  these 
limitations  are  connected  with  the  presence  not  only  of  mor- 
phological, but  also  of  syllabic  and  syntactic,  structures  in  lan- 
guage. Therefore,  in  attempting  to  relate  the  entropy  peak  to 
morpheme  boundaries,  there  arise,  on  the  one  hand,  excess 
peaks  corresponding  to  syllable  boundaries,  where,  on  the  other 
hand,  there  cannot  be  peaks  at  the  morpheme  boundaries,  since 
morpheme  combinability  is  limited  at  the  syntactic  level  (e.g., 
in  the  sentence  It  kills  me,  there  will  not  be  a  peak  after  [itkil], 
since  only  the  morphemes  -s  and  -ed  are  possible  after  the  mor- 
pheme kill-,  according  to  the  given  syntactic  conditions). 

We  note  in  addition  that  experiments  can  only  be  performed 
on  very  short  sentences,  since  precision  in  the  condition— i.e., 
lengthening  of  ai— is  connected  with  a  decrease  in  Ha^,  and  the 
tendency  toward  the  latter's  periodic  growth  at  the  boundaries 
of  linguistic  units  disappears  against  this  background.  The  main 
argument  against  this  approach  to  defining  the  boundaries  be- 
tween language  units  is  that  probabilistic  methods  can  only  give 
the  most  probable  boundaries,  whereas  the  pieces  of  text  hav- 
ing the  "highest  probability  of  being  linguistic  units"  are 
hardly  of  interest  to  linguists.  For  example,  if  we  analyzed  not 
the  sentence  used  as  an  example  but  one  beginning  with  the 
word  letters,  we  would  obtain  the  fragment  [let],  to  which 
nothing  corresponds  on  the  level  of  content,  and  which  is  con- 
sequently not  the  unit  we  are  seeking. 


7.5.  Redundancy,  Interference,  and  the 
Problem  of  Ideality 

Consciousness  of  the  high  degree  of  redundancy  in  language 
code  apparently  forces  us  to  reexamine  several  problems  of 
phonology.  Until  recently,  phonology  clearly  did  not  adequately 
account  for  the  observation  that  sound  distinctions  fulfill  a  dif- 
ferentiating function  only  when  they  are  sufficiently  distinct  to 
the  listener's  ear.  In  evaluating  the  differentiating  capabilities 
of  phonemes,  one  usually  begins  with  a  description  of  the  artic- 
ulation of  the  sounds  and  not  with  the  actual  capacity  of  the 
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hearer  to  perceive  them  as  different  signals.^"  Several  studies 
have  shown  that  articulation  differences  between  letters  fre- 
quently do  not  function  as  an  ideal  guarantee  of  their  always 
being  distinguished  by  the  hearer.  For  example,  it  has  been  es- 
tablished^^ that  the  distinctiveness  of  the  phonemes  [p],  [t], 
and  [k]  in  the  final  position  of  a  word  in  English  is  considera- 
bly less  than  100  per  cent^''  (the  experiments  were  conducted  in 
ideal  conditions  of  audibility;  however,  the  listeners  could  not 
see  the  speaker's  lip  movements).  Also  of  interest  are  data  on 
the  imperfect  distinguishability  of  English  vowel  phonemes." 

If  the  probability  of  distortion  of  an  individual  signal  is  high 
enough  even  under  ideal  conditions  of  audibility,  then  it  in- 
creases in  the  "average"  situation  of  communication,  and  in 
some  instances,  it  rises  quite  significantly. 

Proper  relation  of  each  sound  with  a  phoneme  is  not  hindered 
by  "external"  circumstances  alone  (noise  in  the  environment, 
narrowing  of  the  frequency  band,  and  other  static  in  the  trans- 
mitting apparatus).  Great  distortions  are  caused,  for  example, 
when  the  rate  of  speech  increases;  the  sentence  Aleksandr 
Aleksandrovich  vchera  by  I  v  teatre  [Alexander  Alexandrovitch 
was  at  the  theater  yesterday]  might  sound  like  something  tran- 
scribable  as  [sansancrab''lt''atr^'].'*^  Likewise,  a  constant  source 
of  distortion  of  individual  signals  is  the  imperfect  correspond- 
ence of  the  phonetic  codes  of  speaker  and  listener  caused,  for 
example,  by  differences  in  dialect.  Thus,  in  a  real  situation  of 
language  communication,  not  every  bit  of  speech  corresponds  to 
a  phoneme.  Because  of  the  redimdancy  of  language,  however, 

^' See  H.  Mol  and  E.  M.  Uhlenbeck,  "Hearing  and  the  Concept  of  Phoneme," 
Lingua,  Vol.  8,  No.  2,  1959,  pp.  161-185. 

^See  F.  W.  Householder,  "Unreleased  p,  t,  k,  in  American  English,"  in  For 
Roman  Jakobson,  Mouton  &  Co.,  's-Gravenhage,  The  Netheriands,  1956,  pp. 
235-248. 

''Distinctiveness  is  evaluated  with  the  aid  of  tables  of  meaningless  words; 
regarding  this,  see,  for  example,  L.  R.  Zinder,  "Russkie  artikulyatsionnye  tab- 
litsy"  [Tables  of  Russian  Articulation],  Trudy  voennoj  akademii  tin.  S.  M. 
Budennogo,  Collection  29-30,  Leningrad,  1951,  pp.  35-40. 

■'°  See  H.  L.  Barney  and  H.  K.  Dunn,  "Speech  Analysis,"  in  Manual  of  Pho- 
netics, Kaisir,  ed.,  North-Holland  Publishing  Co.,  Amsterdam,  1957.  See  also 
O.  A.  Miller  and  P.  Nicely,  "An  Analysis  of  Perceptual  Confusions  Among 
Some  English  Consonants,"  Journal  of  the  Acoustical  Society  of  America,  Vol. 
27,  No.  2, 1955,  pp.  338-352. 

■•?  [In  transliteration,  '  stands  for  palatalization,  '  for  b  ,  and  "  for  i.  .— Tr.] 
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to  distinguish— i.e.,  "comprehend"— a  word  or  sentence  upon 
hearing  it,  it  is  not  at  all  necessary  to  perceive  every  one  of  its 
phonemes  precisely;  the  presence  of  syntactic  bonds  among  pho- 
nemes permits  a  loss  of  some  phonemic  oppositions  without  a 
loss  of  comprehensibility,  just  as  the  multivalence  or  even  ho- 
monymity  of  individual  words  does  not,  as  a  rule,  hinder  com- 
prehension of  the  sense  of  the  sentence  as  a  whole,  because  of 
the  presence  of  bonds  among  the  words.  Redundancy  can  be  rep- 
resented as  that  part  of  information  which  is  already  known  to 
the  hearer  if  he  knows  the  statistical  regularities  of  the  particu- 
lar code.  The  worse  the  auditory  conditions,  the  greater  the 
role  played,  relatively  speaking,  by  redundancy  information 
during  perception. 

In  order  to  get  some  idea  of  the  kind  of  role  that  redundancy 
plays  in  speech  comprehension,  we  shall  consider  a  situation  in 
which  redundancy  decreases  significantly  and  the  main  load  is 
taken  up  by  the  differentiating  capacities  of  the  phonemes.  An 
example  of  such  a  situation  is  presented  by  R.  O.  Jakobson  and 
M.  Halle  [27];  under  some  particular  circumstances,  the  lis- 
tener is  told  the  last  names  of  people  with  whom  he  is  not  ac- 
quainted, or  other  proper  nouns  he  does  not  know.  In  this  case, 
unlike  others,  neither  the  speech  context  nor  that  of  the  situa- 
tion can  aid  him  in  comprehending  the  message:  The  proper 
names  are  not  found  in  the  listener's  vocabulary;  moreover,  in 
last  names  and  proper  nouns,  the  laws  of  phoneme  combinabil- 
ity  are  often  broken  (as  in  the  names  Przewalsky,  Bkhilai, 
etc.*-).  In  this  instance,  the  average  redundancy  per  phoneme  is 
much  less  than  the  average  level  characteristic  for  the  language 
as  a  whole.  As  a  result,  the  idealization  that  can  safeguard  a 
language  in  such  a  situation  is  clearly  insufficient.  (For  exam- 
ple, in  the  course  of  a  person's  education,  proper  nouns  are 
written  on  the  board;  otherwise,  a  hearer  would  have  only  a 
very  approximate  idea  of  their  phonology.) 

If  the  statistical  bonds  among  phonemes  play  such  a  great 
role  in  speech  comprehension  "at  the  phonemic  level,"  then 
speech  is  not  perceived  "phonemically"  (hence  the  lack  of  suc- 
cess   in   attempts   to   construct   machines    for   describing   oral 

"  [Of  course,  the  author  of  this  chapter  is  speaking  here  from  the  point  of 
view  of  the  Great  Russian  phonetic  system.— Tr.] 
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speech^^).  If  language  were  perceived  phoneme-by-phoneme, 
then  the  correct  transmission  of  information  via  language  would 
be  impossible.  In  fact,  a  very  complex  process  of  "decoding" 
takes  place  during  the  perception  of  a  language  message:  Dur- 
ing decoding  "at  the  phoneme  level,"  certain  signals  can  be  in- 
correctly comprehended;  but  since  a  word  is  only  a  small  part 
of  combinations  of  signals,  even  a  small  group  of  undistorted 
signals  can  permit  correction  of  the  error  most  of  the  time.  It 
may  even  be  true  that  there  will  not  be  enough  of  these  cor- 
rectly received  signals  to  correct  the  error,  and  the  entire  word 
will  be  mistaken.  But  combinations  of  words  are  not  arbitrary; 
thus,  the  error  can  be  corrected  at  a  higher  level.  For  just  this 
reason,  a  language  message  is  not  comprehended  immediately 
after  the  next  sound  has  been  pronounced,  but  somewhat  hesi- 
tantly (for  some  experimental  data  on  this  matter,  see  [7]  ). 

More  complexity  in  the  coding  system  is  apparently  the  main 
way  to  optimize  technical  communication.  These  systems  will 
thus  be  models  of  the  process  by  which  human  beings  communi- 
cate using  language.** 


8.  The  Tendency  toward  Optimality  in  Language  Codes 

Up  to  now,  we  have  spoken  only  of  the  bonds  among  units  on 
the  same  level.  It  is  also  of  interest  to  note  the  relation  among 
units  of  this  and  a  higher  level,  i.e.,  the  means  of  constructing 
larger  units  from  elementary  symbols. 

We  have  explained  that  language  is  distinguished  by  a  great 
deal  of  redundancy  that  is  the  consequence  of  its  "structured- 
ness"  and  that  guarantees  its  close  approach  to  the  ideal.  Even 
more  interesting  are  tendencies  of  an  oppositional  character  in 
other  aspects  of  language,  namely,  in  the  formation  of  larger 
units  from  small  ones;  here,  language  proves  to  be  highly  effec- 
tive and,  at  least  in  some  respects,  close  to  the  optimal  code. 

We  shall  note  the  latter  characteristic  at  the  level  of  the  rela- 


"^  For  example,  see  D.  G.  Fry  and  P.  Denes,  "Experiments  in  Mechanical 
Speech  Recognition,"  in  Information  Theory,  C.  Cherry,  ed.,  Academic  Press 
Inc.,  New  York,  1956,  pp.  206-212. 

**  See  R.  M.  Fano,  "The  Challenge  of  Digital  Communication,"  IRE  Trans- 
actions on  Information  Theory,  Vol.  IT-4,  No.  2,  June,  1958,  pp.  63-64. 


172     Information  Theory  and  the  Study  of  Language 

tionship  between  distinctive  features  and  phonemes,  and  also  at 
the  level  of  the  relationship  between  letters  and  words. 

Data  on  this  matter  (for  Russian)  are  contained  in  C.  Cherry, 
M.  Halle,  and  R.  O.  Jakobson's  work  [18].  The  total  number 
of  distinctive  features  taking  part  in  the  differentiation  of  Rus- 
sian phonemes  is  11  (see  [18]).  Each  phoneme  can  be  unam- 
biguously characterized  by  means  of  such  features  as  "voiced- 
ness"  and  "vowel  quality."  In  general,  one  can  distinguish  2^^  = 
2,048  phonemes,  using  the  11  features,  and,  consequently,  the 
total  number  of  distinctive  features  is  much  larger  than  the 
minimum  necessary  for  distinguishing  Russian  phonemes  from 
one  another.  However,  not  all  features  participate  in  the  identi- 
fication of  each  phoneme,  but  only  of  some  of  them;  thus,  the 
feature  of  "voicedness"  is  not  distinctive  for  the  phoneme  [c]: 
Although  the  phoneme  [c]  is  unvoiced,  phonetically  speaking, 
still  the  corresponding  voiced  sound  (encountered,  for  example, 
at  the  end  of  the  word  szhech'  in  the  combination  szhech'  by) 
is  not  an  independent  phoneme  in  Russian;  therefore,  the 
"voicedness"  of  [c]  has  no  differentiating  function;  the  feature 
of  "nasalization"  is  distinctive  for  only  a  very  small  number  of 
phonemes,  since  there  are  no  nasal  vowels  in  Russian  nor  any 
sound  like  [rj],  etc.  Therefore,  one  can  construct  the  following 
model  of  a  phoneme:  Phonemes  are  coded  with  strings  of  dis- 
tinctive features;  all  of  these  are  binary,  i.e.,  they  can  take  the 
values  0  or  1  (e.g.,  the  values  of  the  feature  of  "voicedness"  are 
as  follows:  voiced— 1,  unvoiced— 0).  The  number  of  symbols  in 
the  string  is  variable;  e.g.,  in  identifying  the  phoneme  [a]  four 
features  are  relevant,  while  nine  apply  to  the  phoneme  [t] 
(see  Table  3,  p.  150).  Phonemes  occur  in  speech  with  unequal 
frequencies;  therefore,  the  "coding"  of  phonemes  by  distinctive 
features  will  be  optimized  if  the  shortest  code  combinations  are 
written  for  the  most  frequent  phonemes,  and,  conversely,  the 
longest  for  the  least  frequent.  The  degree  of  approximation  of 
the  phoneme  code  to  the  optimum  can  be  evaluated  by  the  de- 
gree to  which  the  average  number  of  distinctive  features  per 
phoneme  occurring  in  text  approaches  the  amount  of  informa- 
tion in  the  phoneme  (we  have  in  mind  the  value  Hi).  The  av- 
erage number  of  distinctive  features  per  phoneme  occurring  in 
text,  according  to  Cherry  et  al.,  is  5.79,  while  Hx  for  Russian  is 
4.78;  as  we  see,  the  divergence  of  the  real  figure  from  the  opti- 
mum is  not  so  very  great. 


The  Tendency  toward  Optimality  in  Language  Codes     173 

It  is  not  hard  to  comprehend  that  this  statistical  regularity  is 
the  result  of  very  simple  facts.*^  The  total  number  of  vowels  in 
the  Russian  phonological  system,  as  in  many  other  systems,  is 
much  lower  than  the  number  of  consonants.  Therefore,  the  fea- 
ture of  vowel  versus  consonant  divides  all  phonemes  into  two 
unequal  groups:  Among  the  vowels,  the  number  of  phonemes 
is  considerably  less  than  it  is  among  the  consonants.  Hence,  the 
number  of  features  needed  to  identify  a  vowel  phoneme  is  less 
than  the  number  needed  for  a  consonant.  Since  there  must  be 
a  vowel  in  every  syllable,  the  relative  frequency  of  vowels  is,  on 
the  average,  higher  than  that  for  consonants.  Thus,  the  rela- 
tively large  frequency  and  small  number  of  differentiating  fea- 
tures for  vowels  is  a  main  source  of  optimality  of  the  phoneme 
code. 


Number  of  words 
in  a  text 
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Number  of  letters  per  word 
Figure  10.     Word  Frequency  as  a  Function  of  Word  Length. 


On  the  subject  of  the  relation  between  word  length  and  word 
frequency.  Miller  et  al.  obtained  interesting  data  ([33],  [34]). 
The  research  was  performed  on  English-language  materials.  The 
results  are  shown  in  Figure  10,  where  word  frequency  is  pre- 
sented as  a  function  of  word  length  (counts  were  made  on  a  text 

*'  On  this  matter,  see  N.  Chomsky's  review  of  R.  O.  Jakobson  and  M.  Halle's 
"Fundamentals  of  Language,"  in  International  Journal  of  American  Linguistics, 
Vol.  23,  No.  3,  1957,  pp.  234-241. 
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36,300  words  in  length).  The  experimental  curve  rather  closely 
approaches  the  curve  of  inverse  proportionality.  This  kind  of 
distribution  of  the  letters  in  words  makes  the  average  length  of 
a  word  in  text  minimal  and  brings  the  amount  of  information 
per  average  letter  closer  to  the  maximum.  Thus,  word  coding 
with  letters  is  nearly  optimal. 

This  phenomenon  was  noted  some  time  ago  (see  bibliography 
for  Chapter  V,  [49]),  and  many  attempts  have  been  made  to 
give  some  reasonable  explanation  for  it.  The  simplest  explana- 
tion has  been  proposed  by  Miller  and  Newman  [33].  Let  us  as- 
sume that  words  originate  by  a  probabilistic  process— say,  by  a 
Markov  process.  The  test  consists  of  finding  whether  a  letter  or 
space  occurs.  The  probability  of  a  space  in  English  is  about  0.2, 
while  the  probability  of  a  letter  is  0.8.  Thus,  the  process  of  cre- 
ating a  word  can  be  represented  as  in  the  scheme  in  Figure  11.^*^ 


Ps{L)  =  1.0 


Figure  11. 


Pl  (L)  =  0.8 


Note: 

L  denotes  letter 

5  denotes  space 

Pjj  (L)  is  probability  of  letter 

being  followed  by  a  letter 

Scheme  for  the  Process  of  Word  Creation. 


The  probability  of  creating  a  concrete  word  i  letters  in  length 
by  such  a  process  is  determined  by  the  two  following  facts:  (a) 
the  number  of  different  chains  of  letters  increases  with  increase 
in  the  length  of  the  chain;  (b)  the  probability  of  creation  of  a 
chain  of  corresponding  length  decreases  with  increase  in  the 
length  of  the  chain.  Actually,  the  probability  of  creating  a  chain 
i  letters  long  is  (0.2)  (0.8)^-^  and  consequently  the  shorter  the 
chain  is,  the  greater  the  probability.  A  result  of  these  two  facts  is 
the  inverse  dependency  between  the  frequency  of  a  word  and 
the  number  of  letters  contained  in  it.  Thus,  the  optimum  rela- 
tion between  word  frequency  and  word  length  emerges  as  a  nec- 

^  [The  numbers  in  the  diagram  make  the  probability  of  a  space  about  0.167, 
not  0.2.-Tr.] 
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essary  consequence,  if  one  assumes  that  words  are  created  by  a 
random  process  of  the  type  already  described;  in  other  words, 
this  relation  is  the  result  of  the  fact  that  a  shift  in  the  location 
of  the  space  in  a  text  is  accidental.  In  fact,  as  Miller  has  shown, 
an  analogous  relation  between  length  and  probability  for  a 
"word"  obtains  if  the  "word"  is  defined  as  an  interval  between 
two  occurrences  of  the  letter  e  in  English  text  (or  rather,  the 
dependency  holds  up  even  more  perfectly  in  this  case). 

Miller's  important  results  are  as  follows.  It  turns  out  that  the 
optimum  relation  between  a  word's  text  frequency  and  its 
length  is  primarily  due  to  auxiliaries.  If  two  graphs  are  con- 
structed, one  showing  the  relationship  between  word  frequency 
and  word  length  for  auxiliaries  and  one  for  full-meaning 
words,*^  then  the  approach  to  inverse  proportionality  will  be 
better  for  auxiliaries  than  it  is  for  all  words  taken  as  a  whole, 
while  for  full-meaning  words,  frequency  and  length  are  almost 
independent.  There  is  as  yet  no  meaningful  explanation  of  this 
fact. 


9.  Limits  on  the  Possible  Applications  of  Information 
Theory  to  Language 

To  summarize,  regarding  the  possibilities  of  applying  informa- 
tion theory  to  linguistic  problems,  one  can  only  say  the  follow- 
ing: Because  of  the  peculiarities  of  mathematical  information 
theory,  only  the  formal  or,  more  precisely  speaking,  the  code 
aspects  of  language  can  be  studied  by  its  methods.  Therefore, 
the  greatest  value  of  information  theory  lies  in  its  application 
to  the  study  of  phonological  and  "letter"  aspects  of  language. 
Also,  these  aspects  are  the  most  suitable  for  application  of  the 
ideas  of  information  theory,  because  any  model  representing 


^'  The  criteria  existing  in  linguistics  for  distinguishing  auxiliaries  from  full- 
meaning  words  are  recognized  by  the  authors  to  be  inexact,  and  a  complete 
list  of  the  words  considered  auxiliary  for  the  purpose  of  the  counts  is  presented 
in  their  work.  Several  of  these  words  cannot  help  but  elicit  surprise.  We  note, 
for  example,  that  complex  numerals  such  as  twenty-seven  are  considered  auxili- 
aries and  are  written  without  the  space.  The  authors  assert,  however,  that  their 
choice  was  not  all  based  on  such  considerations  as  the  lengths  or  frequencies 
of  words. 
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the  process  of  creation  of  a  message  as  the  "emission"  of  one 
symbol  after  another  in  a  linear  sequence,  for  all  levels  of  lan- 
guage besides  the  two  indicated  above,  is  not  very  productive*^ 
(take,  for  example,  the  complex  hierarchical  structure  of  a  sen- 
tence at  the  syntactic  level). 

It  may  be  that  R.  Carnap  and  Y.  Bar-Hillel's  semantic  infor- 
mation theory  ([13],  [14],  [22])  can  play  a  large  role  in  fu- 
ture studies  of  the  semantic  aspect  of  language.  This  theory  is 
based  on  R.  Carnap's  concept  of  inductive  probability,  and  it 
develops  several  special  questions  in  logical  semantics.  The  the- 
ory considers  semantic  questions  in  artificial  languages  con- 
structed for  the  purpose  of  describing  a  narrowly  limited  situa- 
tion for  which  one  can  enumerate  all  the  objects,  their  possible 
properties,  and  the  logical  bonds  among  elementary  assertions 
about  the  properties  of  the  objects.  It  may  be  that  semantic  in- 
formation theory  can  be  used  for  future  study  of  natural  lan- 
guages, at  least  within  limited  spheres  of  their  application. 
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